{"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 280], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 911, 280], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [686, 73, 920, 104], \"label\": \"Page-header\"}, {\"bbox_2d\": [208, 123, 691, 171], \"label\": \"Caption\"}, {\"bbox_2d\": [330, 926, 651, 944], \"label\": \"Caption\"}, {\"bbox_2d\": [482, 998, 512, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [119, 123, 852, 908], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [686, 72, 934, 103], \"label\": \"Page-header\"}, {\"bbox_2d\": [124, 120, 853, 909], \"label\": \"Picture\"}, {\"bbox_2d\": [330, 927, 665, 943], \"label\": \"Caption\"}, {\"bbox_2d\": [484, 999, 516, 1014], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [152, 70, 964, 108], \"label\": \"Text\"}, {\"bbox_2d\": [152, 122, 964, 147], \"label\": \"Text\"}, {\"bbox_2d\": [152, 161, 964, 199], \"label\": \"Text\"}, {\"bbox_2d\": [152, 267, 935, 305], \"label\": \"Text\"}, {\"bbox_2d\": [152, 319, 942, 343], \"label\": \"Text\"}, {\"bbox_2d\": [152, 476, 965, 501], \"label\": \"Text\"}, {\"bbox_2d\": [152, 529, 960, 566], \"label\": \"Text\"}, {\"bbox_2d\": [152, 594, 955, 618], \"label\": \"Text\"}, {\"bbox_2d\": [152, 513, 271, 525], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 254, 255, 265], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 225, 193, 235], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 580, 366, 591], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 474, 966, 499], \"label\": \"Text\"}, {\"bbox_2d\": [176, 628, 966, 870], \"label\": \"Table\"}, {\"bbox_2d\": [176, 330, 962, 463], \"label\": \"Table\"}, {\"bbox_2d\": [52, 982, 67, 993], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [152, 70, 966, 109], \"label\": \"Text\"}, {\"bbox_2d\": [153, 122, 966, 147], \"label\": \"Text\"}, {\"bbox_2d\": [152, 161, 966, 199], \"label\": \"Text\"}, {\"bbox_2d\": [176, 267, 940, 305], \"label\": \"Text\"}, {\"bbox_2d\": [152, 475, 966, 500], \"label\": \"Text\"}, {\"bbox_2d\": [176, 527, 964, 566], \"label\": \"Text\"}, {\"bbox_2d\": [176, 593, 959, 618], \"label\": \"Text\"}, {\"bbox_2d\": [52, 982, 67, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [122, 225, 196, 236], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 253, 259, 264], \"label\": \"List-item\"}, {\"bbox_2d\": [153, 514, 276, 525], \"label\": \"List-item\"}, {\"bbox_2d\": [153, 579, 369, 590], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 345, 961, 461], \"label\": \"Table\"}, {\"bbox_2d\": [175, 628, 967, 867], \"label\": \"Table\"}, {\"bbox_2d\": [176, 318, 944, 343], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [152, 107, 691, 121], \"label\": \"Text\"}, {\"bbox_2d\": [152, 128, 798, 142], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 149, 824, 163], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 170, 826, 202], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 209, 819, 223], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 230, 892, 284], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 371, 301, 385], \"label\": \"Text\"}, {\"bbox_2d\": [133, 392, 884, 446], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 513, 722, 526], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 533, 873, 587], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 694, 902, 748], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 673, 301, 687], \"label\": \"Text\"}, {\"bbox_2d\": [152, 856, 378, 869], \"label\": \"Text\"}, {\"bbox_2d\": [133, 876, 897, 930], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 754, 894, 829], \"label\": \"List-item\"}, {\"bbox_2d\": [189, 613, 883, 647], \"label\": \"Section-header\"}, {\"bbox_2d\": [189, 473, 847, 487], \"label\": \"Section-header\"}, {\"bbox_2d\": [189, 311, 836, 345], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 959, 528, 975], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [152, 106, 698, 122], \"label\": \"Text\"}, {\"bbox_2d\": [152, 126, 805, 142], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 146, 831, 163], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 207, 826, 223], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 167, 830, 203], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 227, 897, 284], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 389, 890, 445], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 510, 729, 526], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 530, 878, 586], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 691, 906, 748], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 752, 900, 829], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 873, 903, 929], \"label\": \"List-item\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}, {\"bbox_2d\": [190, 611, 884, 647], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 469, 854, 486], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 308, 837, 344], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 368, 305, 385], \"label\": \"Text\"}, {\"bbox_2d\": [152, 671, 305, 687], \"label\": \"Text\"}, {\"bbox_2d\": [152, 853, 385, 869], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [76, 116, 1014, 908], \"label\": \"Table\"}, {\"bbox_2d\": [873, 909, 939, 922], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [77, 43, 316, 123], \"label\": \"Picture\"}, {\"bbox_2d\": [73, 123, 1020, 907], \"label\": \"Table\"}, {\"bbox_2d\": [873, 906, 934, 922], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 166, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 315, 279, 328], \"label\": \"Text\"}, {\"bbox_2d\": [121, 340, 914, 617], \"label\": \"Text\"}, {\"bbox_2d\": [121, 629, 914, 746], \"label\": \"Text\"}, {\"bbox_2d\": [121, 758, 914, 800], \"label\": \"Text\"}, {\"bbox_2d\": [121, 811, 914, 877], \"label\": \"Text\"}, {\"bbox_2d\": [510, 970, 529, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [365, 907, 672, 934], \"label\": \"Formula\"}, {\"bbox_2d\": [251, 128, 815, 286], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 98, 163, 109], \"label\": \"Text\"}, {\"bbox_2d\": [121, 316, 306, 327], \"label\": \"Text\"}, {\"bbox_2d\": [121, 342, 914, 614], \"label\": \"Text\"}, {\"bbox_2d\": [121, 628, 914, 744], \"label\": \"Text\"}, {\"bbox_2d\": [121, 759, 914, 796], \"label\": \"Text\"}, {\"bbox_2d\": [121, 810, 914, 875], \"label\": \"Text\"}, {\"bbox_2d\": [364, 910, 671, 930], \"label\": \"Formula\"}, {\"bbox_2d\": [229, 130, 806, 282], \"label\": \"Formula\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [198, 65, 936, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [200, 464, 382, 481], \"label\": \"Section-header\"}, {\"bbox_2d\": [200, 628, 518, 644], \"label\": \"Section-header\"}, {\"bbox_2d\": [200, 654, 425, 671], \"label\": \"Section-header\"}, {\"bbox_2d\": [114, 945, 130, 962], \"label\": \"Page-footer\"}, {\"bbox_2d\": [42, 957, 146, 1020], \"label\": \"Picture\"}, {\"bbox_2d\": [198, 216, 948, 457], \"label\": \"Table\"}, {\"bbox_2d\": [201, 482, 502, 575], \"label\": \"Table\"}, {\"bbox_2d\": [201, 677, 947, 899], \"label\": \"Table\"}, {\"bbox_2d\": [238, 486, 500, 547], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [198, 69, 937, 100], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 139, 946, 897], \"label\": \"Table\"}, {\"bbox_2d\": [52, 960, 147, 1015], \"label\": \"Picture\"}, {\"bbox_2d\": [113, 948, 133, 958], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 122, 881, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 164, 542, 182], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 303, 298, 320], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 473, 390, 490], \"label\": \"Section-header\"}, {\"bbox_2d\": [904, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [292, 745, 886, 772], \"label\": \"Text\"}, {\"bbox_2d\": [292, 778, 449, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 869, 428, 881], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 639, 428, 652], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 534, 448, 546], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 402, 929, 440], \"label\": \"Text\"}, {\"bbox_2d\": [292, 328, 745, 339], \"label\": \"Text\"}, {\"bbox_2d\": [292, 240, 908, 284], \"label\": \"Text\"}, {\"bbox_2d\": [292, 193, 925, 221], \"label\": \"Text\"}, {\"bbox_2d\": [302, 354, 505, 396], \"label\": \"Table\"}, {\"bbox_2d\": [309, 560, 418, 622], \"label\": \"Table\"}, {\"bbox_2d\": [309, 666, 408, 708], \"label\": \"Table\"}, {\"bbox_2d\": [309, 805, 408, 955], \"label\": \"Table\"}, {\"bbox_2d\": [292, 498, 886, 526], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 881, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 165, 542, 181], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 193, 924, 222], \"label\": \"Text\"}, {\"bbox_2d\": [293, 240, 908, 285], \"label\": \"Text\"}, {\"bbox_2d\": [293, 326, 744, 340], \"label\": \"Text\"}, {\"bbox_2d\": [293, 497, 886, 526], \"label\": \"Text\"}, {\"bbox_2d\": [293, 743, 886, 772], \"label\": \"Text\"}, {\"bbox_2d\": [904, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 530, 448, 546], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 637, 427, 653], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 776, 448, 791], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 867, 427, 883], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 303, 298, 320], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 474, 390, 491], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 354, 439, 366], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 369, 417, 381], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 385, 504, 397], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 401, 929, 441], \"label\": \"Text\"}, {\"bbox_2d\": [312, 560, 391, 572], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 576, 410, 588], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 591, 401, 603], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 607, 416, 619], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 667, 382, 679], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 683, 404, 695], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 698, 408, 710], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 805, 385, 817], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 821, 404, 833], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 837, 395, 849], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 897, 388, 909], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 912, 410, 924], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 928, 414, 940], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 944, 416, 956], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 74, 904, 661], \"label\": \"Picture\"}, {\"bbox_2d\": [423, 681, 951, 722], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 681, 391, 706], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 742, 365, 754], \"label\": \"Section-header\"}, {\"bbox_2d\": [423, 759, 951, 802], \"label\": \"Text\"}, {\"bbox_2d\": [423, 809, 951, 948], \"label\": \"Text\"}, {\"bbox_2d\": [91, 778, 391, 947], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [117, 80, 442, 662], \"label\": \"Picture\"}, {\"bbox_2d\": [562, 78, 908, 666], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 680, 505, 705], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 680, 951, 719], \"label\": \"Caption\"}, {\"bbox_2d\": [232, 743, 364, 753], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 758, 951, 802], \"label\": \"Text\"}, {\"bbox_2d\": [91, 777, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 808, 951, 947], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [938, 39, 973, 77], \"label\": \"Page-header\"}, {\"bbox_2d\": [929, 71, 976, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [683, 193, 861, 207], \"label\": \"Section-header\"}, {\"bbox_2d\": [683, 218, 929, 239], \"label\": \"Text\"}, {\"bbox_2d\": [683, 246, 953, 303], \"label\": \"Text\"}, {\"bbox_2d\": [683, 314, 953, 386], \"label\": \"Text\"}, {\"bbox_2d\": [683, 395, 953, 446], \"label\": \"Text\"}, {\"bbox_2d\": [683, 488, 929, 539], \"label\": \"Text\"}, {\"bbox_2d\": [683, 543, 929, 591], \"label\": \"Text\"}, {\"bbox_2d\": [683, 601, 929, 664], \"label\": \"Text\"}, {\"bbox_2d\": [683, 644, 929, 665], \"label\": \"Text\"}, {\"bbox_2d\": [683, 703, 929, 744], \"label\": \"Text\"}, {\"bbox_2d\": [683, 786, 929, 807], \"label\": \"List-item\"}, {\"bbox_2d\": [683, 810, 929, 851], \"label\": \"List-item\"}, {\"bbox_2d\": [683, 859, 929, 890], \"label\": \"Text\"}, {\"bbox_2d\": [943, 994, 991, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [683, 756, 898, 768], \"label\": \"Text\"}, {\"bbox_2d\": [92, 682, 296, 690], \"label\": \"Caption\"}, {\"bbox_2d\": [92, 419, 273, 428], \"label\": \"Caption\"}, {\"bbox_2d\": [92, 124, 208, 133], \"label\": \"Caption\"}, {\"bbox_2d\": [106, 154, 624, 372], \"label\": \"Picture\"}, {\"bbox_2d\": [106, 456, 624, 638], \"label\": \"Picture\"}, {\"bbox_2d\": [104, 711, 624, 935], \"label\": \"Picture\"}, {\"bbox_2d\": [683, 464, 891, 478], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 692, 296, 700], \"label\": \"Caption\"}, {\"bbox_2d\": [683, 647, 929, 665], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [935, 34, 974, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [877, 68, 976, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [1002, 235, 1018, 305], \"label\": \"Page-header\"}, {\"bbox_2d\": [684, 214, 930, 238], \"label\": \"Text\"}, {\"bbox_2d\": [684, 246, 958, 306], \"label\": \"Text\"}, {\"bbox_2d\": [684, 314, 958, 387], \"label\": \"Text\"}, {\"bbox_2d\": [684, 395, 960, 443], \"label\": \"Text\"}, {\"bbox_2d\": [684, 486, 952, 534], \"label\": \"Text\"}, {\"bbox_2d\": [684, 542, 950, 591], \"label\": \"Text\"}, {\"bbox_2d\": [684, 599, 940, 634], \"label\": \"Text\"}, {\"bbox_2d\": [684, 642, 959, 691], \"label\": \"Text\"}, {\"bbox_2d\": [684, 699, 957, 747], \"label\": \"Text\"}, {\"bbox_2d\": [684, 755, 898, 766], \"label\": \"Text\"}, {\"bbox_2d\": [684, 858, 956, 894], \"label\": \"Text\"}, {\"bbox_2d\": [943, 991, 991, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [684, 189, 861, 207], \"label\": \"Section-header\"}, {\"bbox_2d\": [684, 460, 892, 479], \"label\": \"Section-header\"}, {\"bbox_2d\": [684, 771, 954, 807], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 811, 953, 847], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 140, 653, 391], \"label\": \"Picture\"}, {\"bbox_2d\": [93, 120, 208, 133], \"label\": \"Caption\"}, {\"bbox_2d\": [93, 415, 273, 428], \"label\": \"Caption\"}, {\"bbox_2d\": [93, 679, 296, 691], \"label\": \"Caption\"}, {\"bbox_2d\": [79, 436, 653, 655], \"label\": \"Picture\"}, {\"bbox_2d\": [77, 700, 656, 952], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [163, 75, 871, 344], \"label\": \"Picture\"}, {\"bbox_2d\": [94, 362, 911, 526], \"label\": \"Caption\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [95, 354, 940, 527], \"label\": \"Caption\"}, {\"bbox_2d\": [503, 989, 513, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [156, 69, 867, 347], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [58, 17, 173, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [309, 27, 612, 40], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 43, 761, 55], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 100, 1009, 123], \"label\": \"Text\"}, {\"bbox_2d\": [25, 205, 1009, 240], \"label\": \"Text\"}, {\"bbox_2d\": [25, 244, 1009, 290], \"label\": \"Text\"}, {\"bbox_2d\": [25, 414, 1009, 459], \"label\": \"Text\"}, {\"bbox_2d\": [25, 463, 1009, 486], \"label\": \"Text\"}, {\"bbox_2d\": [25, 490, 1009, 539], \"label\": \"Text\"}, {\"bbox_2d\": [25, 543, 1009, 566], \"label\": \"Text\"}, {\"bbox_2d\": [25, 570, 1009, 605], \"label\": \"Text\"}, {\"bbox_2d\": [504, 656, 529, 668], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 126, 1010, 198], \"label\": \"Table\"}, {\"bbox_2d\": [25, 292, 1011, 400], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [58, 17, 173, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [422, 28, 612, 39], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 45, 760, 56], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 101, 1009, 124], \"label\": \"Text\"}, {\"bbox_2d\": [25, 206, 1009, 239], \"label\": \"Text\"}, {\"bbox_2d\": [25, 245, 1009, 290], \"label\": \"Text\"}, {\"bbox_2d\": [25, 414, 1009, 459], \"label\": \"Text\"}, {\"bbox_2d\": [25, 465, 1009, 487], \"label\": \"Text\"}, {\"bbox_2d\": [25, 492, 1009, 537], \"label\": \"Text\"}, {\"bbox_2d\": [25, 543, 1009, 565], \"label\": \"Text\"}, {\"bbox_2d\": [25, 572, 1009, 605], \"label\": \"Text\"}, {\"bbox_2d\": [504, 656, 529, 667], \"label\": \"Page-footer\"}, {\"bbox_2d\": [20, 292, 1013, 407], \"label\": \"Table\"}, {\"bbox_2d\": [25, 124, 1011, 199], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 107, 893, 377], \"label\": \"Text\"}, {\"bbox_2d\": [147, 413, 893, 633], \"label\": \"Text\"}, {\"bbox_2d\": [147, 668, 893, 911], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [148, 108, 889, 376], \"label\": \"Text\"}, {\"bbox_2d\": [148, 415, 888, 631], \"label\": \"Text\"}, {\"bbox_2d\": [148, 668, 887, 911], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [101, 72, 506, 97], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 101, 506, 140], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 144, 506, 179], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 183, 506, 222], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 226, 506, 251], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 255, 506, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 284, 506, 309], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 313, 506, 339], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 343, 506, 369], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 373, 506, 399], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 403, 506, 429], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 433, 506, 448], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 452, 506, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 482, 506, 508], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 512, 506, 538], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 542, 506, 557], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 561, 506, 587], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 591, 506, 617], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 621, 506, 636], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 640, 506, 666], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 670, 506, 700], \"label\": \"Text\"}, {\"bbox_2d\": [536, 540, 929, 577], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 571, 929, 609], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 603, 929, 666], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 670, 929, 689], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 703, 929, 742], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 746, 929, 772], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 776, 929, 802], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 806, 929, 832], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 836, 929, 862], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 866, 929, 892], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 896, 929, 922], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 926, 929, 952], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 956, 929, 982], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 986, 929, 1012], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1016, 929, 1042], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1046, 929, 1072], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1076, 929, 1102], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1106, 929, 1132], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1136, 929, 1162], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1166, 929, 1192], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1196, 929, 1222], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1226, 929, 1252], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1256, 929, 1282], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1286, 929, 1312], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1316, 929, 1342], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1346, 929, 1372], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1376, 929, 1402], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1406, 929, 1432], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1436, 929, 1462], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1466, 929, 1492], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1496, 929, 1522], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1526, 929, 1552], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1556, 929, 1582], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1586, 929, 1612], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1616, 929, 1642], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1646, 929, 1672], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1676, 929, 1702], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1706, 929, 1732], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 1736, 929, 1762], \"", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [567, 72, 732, 84], \"label\": \"Text\"}, {\"bbox_2d\": [99, 72, 506, 97], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 99, 506, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 140, 506, 166], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 168, 506, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 195, 501, 220], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 222, 506, 248], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 250, 506, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 277, 388, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 291, 436, 303], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 304, 474, 316], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 318, 506, 344], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 345, 506, 371], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 373, 436, 385], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 387, 462, 398], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 400, 506, 426], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 428, 506, 453], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 455, 506, 480], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 482, 506, 508], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 510, 506, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 537, 493, 549], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 551, 506, 590], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 551, 951, 576], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 537, 917, 549], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 482, 951, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 428, 951, 480], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 400, 951, 426], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 373, 951, 398], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 345, 951, 371], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 318, 951, 344], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 291, 951, 316], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 263, 951, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 236, 951, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 209, 951, 234], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 195, 847, 207], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 86, 951, 111], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 113, 951, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 140, 950, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 154, 819, 166], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 168, 951, 193], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 85, 973, 159], \"label\": \"Title\"}, {\"bbox_2d\": [87, 167, 140, 187], \"label\": \"Text\"}, {\"bbox_2d\": [87, 194, 348, 215], \"label\": \"Text\"}, {\"bbox_2d\": [87, 222, 157, 243], \"label\": \"Text\"}, {\"bbox_2d\": [87, 250, 682, 272], \"label\": \"Text\"}, {\"bbox_2d\": [87, 280, 161, 301], \"label\": \"Text\"}, {\"bbox_2d\": [87, 308, 478, 330], \"label\": \"Text\"}, {\"bbox_2d\": [87, 352, 954, 401], \"label\": \"Text\"}, {\"bbox_2d\": [87, 408, 227, 428], \"label\": \"Text\"}, {\"bbox_2d\": [87, 435, 954, 484], \"label\": \"Text\"}, {\"bbox_2d\": [87, 488, 248, 510], \"label\": \"Text\"}, {\"bbox_2d\": [109, 523, 355, 543], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 545, 797, 565], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 566, 304, 586], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 587, 467, 608], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 609, 322, 629], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 630, 264, 650], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 651, 401, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 673, 422, 693], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 694, 253, 714], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 726, 345, 748], \"label\": \"Text\"}, {\"bbox_2d\": [87, 757, 307, 778], \"label\": \"Text\"}, {\"bbox_2d\": [87, 798, 407, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 820, 421, 840], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 841, 332, 861], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 862, 380, 882], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 894, 982, 944], \"label\": \"Text\"}, {\"bbox_2d\": [87, 952, 365, 973], \"label\": \"Text\"}, {\"bbox_2d\": [485, 1003, 585, 1019], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 85, 971, 158], \"label\": \"Title\"}, {\"bbox_2d\": [87, 167, 139, 186], \"label\": \"Text\"}, {\"bbox_2d\": [87, 195, 347, 215], \"label\": \"Text\"}, {\"bbox_2d\": [87, 224, 156, 243], \"label\": \"Text\"}, {\"bbox_2d\": [87, 252, 683, 272], \"label\": \"Text\"}, {\"bbox_2d\": [87, 310, 477, 329], \"label\": \"Text\"}, {\"bbox_2d\": [87, 352, 954, 401], \"label\": \"Text\"}, {\"bbox_2d\": [87, 435, 953, 484], \"label\": \"Text\"}, {\"bbox_2d\": [87, 489, 254, 509], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 407, 231, 426], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 281, 160, 300], \"label\": \"Section-header\"}, {\"bbox_2d\": [108, 524, 354, 543], \"label\": \"List-item\"}, {\"bbox_2d\": [108, 545, 798, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [108, 566, 304, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [108, 586, 467, 606], \"label\": \"List-item\"}, {\"bbox_2d\": [108, 607, 322, 627], \"label\": \"List-item\"}, {\"bbox_2d\": [108, 628, 264, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [108, 649, 402, 669], \"label\": \"List-item\"}, {\"bbox_2d\": [108, 670, 421, 690], \"label\": \"List-item\"}, {\"bbox_2d\": [108, 691, 253, 711], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 777, 883, 797], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 798, 405, 818], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 819, 420, 839], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 840, 332, 860], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 861, 379, 881], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 756, 306, 776], \"label\": \"Text\"}, {\"bbox_2d\": [87, 895, 982, 944], \"label\": \"Text\"}, {\"bbox_2d\": [87, 953, 365, 972], \"label\": \"Text\"}, {\"bbox_2d\": [87, 728, 344, 748], \"label\": \"Section-header\"}, {\"bbox_2d\": [485, 1003, 585, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [491, 72, 538, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 96, 156, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 112, 94, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 205, 159, 215], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 127, 1009, 161], \"label\": \"Text\"}, {\"bbox_2d\": [25, 170, 1009, 191], \"label\": \"Text\"}, {\"bbox_2d\": [25, 221, 1009, 266], \"label\": \"Text\"}, {\"bbox_2d\": [25, 275, 1009, 342], \"label\": \"Text\"}, {\"bbox_2d\": [25, 351, 1009, 384], \"label\": \"Text\"}, {\"bbox_2d\": [25, 394, 1009, 427], \"label\": \"Text\"}, {\"bbox_2d\": [25, 436, 1009, 495], \"label\": \"Text\"}, {\"bbox_2d\": [25, 504, 104, 514], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 521, 1009, 622], \"label\": \"Text\"}, {\"bbox_2d\": [513, 646, 522, 656], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 25, 155, 35], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [491, 71, 539, 82], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 95, 157, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 111, 93, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 204, 158, 215], \"label\": \"Section-header\"}, {\"bbox_2d\": [40, 505, 104, 516], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 126, 1009, 160], \"label\": \"Text\"}, {\"bbox_2d\": [25, 169, 1009, 191], \"label\": \"Text\"}, {\"bbox_2d\": [25, 220, 1009, 265], \"label\": \"Text\"}, {\"bbox_2d\": [25, 274, 1009, 341], \"label\": \"Text\"}, {\"bbox_2d\": [25, 350, 1009, 384], \"label\": \"Text\"}, {\"bbox_2d\": [25, 392, 1009, 426], \"label\": \"Text\"}, {\"bbox_2d\": [25, 435, 1009, 492], \"label\": \"Text\"}, {\"bbox_2d\": [25, 521, 1009, 622], \"label\": \"Text\"}, {\"bbox_2d\": [513, 645, 521, 656], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [136, 261, 411, 270], \"label\": \"Text\"}, {\"bbox_2d\": [136, 271, 911, 289], \"label\": \"Text\"}, {\"bbox_2d\": [136, 371, 911, 411], \"label\": \"Text\"}, {\"bbox_2d\": [149, 416, 891, 442], \"label\": \"Text\"}, {\"bbox_2d\": [172, 446, 802, 456], \"label\": \"Text\"}, {\"bbox_2d\": [150, 463, 885, 489], \"label\": \"Text\"}, {\"bbox_2d\": [136, 939, 886, 947], \"label\": \"Text\"}, {\"bbox_2d\": [136, 957, 855, 979], \"label\": \"Text\"}, {\"bbox_2d\": [172, 981, 502, 989], \"label\": \"Text\"}, {\"bbox_2d\": [136, 309, 420, 318], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 322, 254, 332], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 510, 192, 520], \"label\": \"Section-header\"}, {\"bbox_2d\": [136, 521, 916, 716], \"label\": \"Table\"}, {\"bbox_2d\": [136, 739, 915, 932], \"label\": \"Table\"}, {\"bbox_2d\": [45, 1000, 170, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [45, 15, 83, 27], \"label\": \"Page-header\"}, {\"bbox_2d\": [136, 336, 637, 346], \"label\": \"List-item\"}, {\"bbox_2d\": [149, 353, 474, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [149, 354, 472, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [149, 354, 637, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [136, 101, 915, 250], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 255, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [133, 99, 921, 255], \"label\": \"Table\"}, {\"bbox_2d\": [137, 262, 409, 270], \"label\": \"Text\"}, {\"bbox_2d\": [170, 272, 913, 291], \"label\": \"Text\"}, {\"bbox_2d\": [113, 323, 255, 333], \"label\": \"Section-header\"}, {\"bbox_2d\": [103, 307, 419, 317], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 338, 639, 349], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 354, 472, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 370, 914, 380], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 385, 908, 412], \"label\": \"Text\"}, {\"bbox_2d\": [149, 417, 892, 443], \"label\": \"Text\"}, {\"bbox_2d\": [173, 448, 807, 459], \"label\": \"Text\"}, {\"bbox_2d\": [149, 464, 889, 490], \"label\": \"Text\"}, {\"bbox_2d\": [113, 511, 192, 521], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 523, 919, 718], \"label\": \"Table\"}, {\"bbox_2d\": [132, 739, 919, 933], \"label\": \"Table\"}, {\"bbox_2d\": [137, 939, 887, 958], \"label\": \"Text\"}, {\"bbox_2d\": [170, 960, 852, 979], \"label\": \"Text\"}, {\"bbox_2d\": [170, 981, 502, 989], \"label\": \"Text\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [61, 39, 240, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [79, 192, 449, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 224, 164, 237], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 364, 335, 375], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 517, 398, 528], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 241, 500, 251], \"label\": \"Text\"}, {\"bbox_2d\": [79, 260, 500, 270], \"label\": \"Text\"}, {\"bbox_2d\": [79, 380, 787, 490], \"label\": \"Text\"}, {\"bbox_2d\": [79, 499, 400, 528], \"label\": \"Text\"}, {\"bbox_2d\": [79, 535, 956, 632], \"label\": \"Table\"}, {\"bbox_2d\": [23, 977, 193, 996], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [57, 34, 240, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 189, 450, 207], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 221, 166, 237], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 361, 336, 376], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 512, 399, 528], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 241, 500, 252], \"label\": \"Text\"}, {\"bbox_2d\": [105, 260, 407, 335], \"label\": \"Table\"}, {\"bbox_2d\": [106, 380, 789, 487], \"label\": \"Table\"}, {\"bbox_2d\": [75, 533, 959, 633], \"label\": \"Table\"}, {\"bbox_2d\": [78, 984, 193, 995], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [351, 464, 684, 477], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 519, 914, 611], \"label\": \"Text\"}, {\"bbox_2d\": [121, 736, 914, 931], \"label\": \"Text\"}, {\"bbox_2d\": [509, 970, 526, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 653, 914, 709], \"label\": \"Section-header\"}, {\"bbox_2d\": [207, 117, 863, 441], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [351, 465, 684, 477], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 520, 914, 610], \"label\": \"Text\"}, {\"bbox_2d\": [121, 735, 914, 929], \"label\": \"Text\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 653, 914, 708], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 110, 865, 445], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [129, 18, 255, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [40, 14, 82, 26], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 99, 708, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 116, 914, 477], \"label\": \"Table\"}, {\"bbox_2d\": [42, 999, 169, 1015], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [37, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 20, 255, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 101, 708, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [70, 116, 915, 477], \"label\": \"Table\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [86, 27, 212, 43], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 47, 438, 61], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 65, 628, 77], \"label\": \"Text\"}, {\"bbox_2d\": [86, 95, 518, 108], \"label\": \"Text\"}, {\"bbox_2d\": [86, 747, 891, 768], \"label\": \"Text\"}, {\"bbox_2d\": [86, 771, 909, 782], \"label\": \"Text\"}, {\"bbox_2d\": [86, 120, 967, 742], \"label\": \"Table\"}, {\"bbox_2d\": [801, 997, 964, 1005], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [85, 22, 439, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 62, 628, 78], \"label\": \"Text\"}, {\"bbox_2d\": [85, 92, 518, 107], \"label\": \"Text\"}, {\"bbox_2d\": [85, 118, 969, 742], \"label\": \"Table\"}, {\"bbox_2d\": [85, 744, 894, 768], \"label\": \"Text\"}, {\"bbox_2d\": [85, 769, 910, 781], \"label\": \"Text\"}, {\"bbox_2d\": [802, 996, 965, 1006], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [683, 69, 931, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [110, 70, 203, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 103, 150, 118], \"label\": \"Text\"}, {\"bbox_2d\": [121, 130, 911, 144], \"label\": \"Text\"}, {\"bbox_2d\": [121, 159, 910, 230], \"label\": \"Text\"}, {\"bbox_2d\": [516, 981, 526, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [684, 69, 931, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [110, 69, 203, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [122, 104, 151, 119], \"label\": \"Text\"}, {\"bbox_2d\": [171, 128, 910, 146], \"label\": \"Text\"}, {\"bbox_2d\": [122, 156, 910, 230], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 35, 409, 88], \"label\": \"Picture\"}, {\"bbox_2d\": [123, 139, 918, 197], \"label\": \"Text\"}, {\"bbox_2d\": [126, 217, 544, 231], \"label\": \"Text\"}, {\"bbox_2d\": [123, 502, 916, 531], \"label\": \"Text\"}, {\"bbox_2d\": [123, 551, 917, 579], \"label\": \"Text\"}, {\"bbox_2d\": [123, 599, 917, 627], \"label\": \"Text\"}, {\"bbox_2d\": [123, 737, 917, 869], \"label\": \"Text\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [123, 644, 351, 657], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 660, 917, 688], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 691, 917, 718], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 469, 780, 485], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 249, 917, 292], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 295, 917, 338], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 341, 917, 383], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 386, 917, 442], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [123, 136, 918, 196], \"label\": \"Text\"}, {\"bbox_2d\": [129, 215, 551, 230], \"label\": \"Text\"}, {\"bbox_2d\": [123, 501, 912, 531], \"label\": \"Text\"}, {\"bbox_2d\": [123, 548, 918, 578], \"label\": \"Text\"}, {\"bbox_2d\": [123, 595, 918, 625], \"label\": \"Text\"}, {\"bbox_2d\": [123, 734, 918, 869], \"label\": \"Text\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [154, 642, 358, 657], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 657, 918, 687], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 687, 918, 717], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 247, 918, 292], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 292, 918, 337], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 337, 918, 382], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 382, 918, 442], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 466, 790, 484], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 34, 409, 86], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [628, 53, 645, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [464, 53, 596, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [310, 53, 323, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 85, 506, 285], \"label\": \"Text\"}, {\"bbox_2d\": [123, 287, 506, 434], \"label\": \"Text\"}, {\"bbox_2d\": [123, 435, 506, 635], \"label\": \"Text\"}, {\"bbox_2d\": [123, 636, 506, 926], \"label\": \"Text\"}, {\"bbox_2d\": [555, 83, 937, 141], \"label\": \"Text\"}, {\"bbox_2d\": [555, 144, 937, 183], \"label\": \"Text\"}, {\"bbox_2d\": [555, 186, 937, 281], \"label\": \"Text\"}, {\"bbox_2d\": [555, 284, 937, 347], \"label\": \"Text\"}, {\"bbox_2d\": [555, 350, 937, 561], \"label\": \"Text\"}, {\"bbox_2d\": [555, 564, 937, 632], \"label\": \"Text\"}, {\"bbox_2d\": [555, 635, 937, 926], \"label\": \"Text\"}, {\"bbox_2d\": [526, 988, 533, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [309, 55, 324, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [463, 54, 595, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [740, 55, 755, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 84, 514, 286], \"label\": \"Text\"}, {\"bbox_2d\": [123, 287, 519, 431], \"label\": \"Text\"}, {\"bbox_2d\": [123, 433, 510, 635], \"label\": \"Text\"}, {\"bbox_2d\": [123, 636, 517, 925], \"label\": \"Text\"}, {\"bbox_2d\": [554, 84, 950, 141], \"label\": \"Text\"}, {\"bbox_2d\": [554, 142, 944, 185], \"label\": \"Text\"}, {\"bbox_2d\": [554, 186, 948, 286], \"label\": \"Text\"}, {\"bbox_2d\": [554, 288, 933, 344], \"label\": \"Text\"}, {\"bbox_2d\": [554, 346, 947, 562], \"label\": \"Text\"}, {\"bbox_2d\": [554, 564, 936, 635], \"label\": \"Text\"}, {\"bbox_2d\": [554, 636, 958, 925], \"label\": \"Text\"}, {\"bbox_2d\": [525, 989, 533, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [518, 146, 671, 158], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 148, 255, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 161, 914, 330], \"label\": \"Table\"}, {\"bbox_2d\": [518, 376, 914, 523], \"label\": \"Table\"}, {\"bbox_2d\": [76, 335, 465, 392], \"label\": \"Text\"}, {\"bbox_2d\": [518, 362, 902, 369], \"label\": \"Text\"}, {\"bbox_2d\": [518, 369, 877, 376], \"label\": \"Text\"}, {\"bbox_2d\": [76, 397, 471, 478], \"label\": \"Table\"}, {\"bbox_2d\": [76, 482, 321, 493], \"label\": \"Text\"}, {\"bbox_2d\": [76, 502, 471, 544], \"label\": \"Table\"}, {\"bbox_2d\": [76, 559, 471, 743], \"label\": \"Table\"}, {\"bbox_2d\": [518, 555, 914, 724], \"label\": \"Table\"}, {\"bbox_2d\": [518, 539, 688, 551], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 728, 914, 767], \"label\": \"Text\"}, {\"bbox_2d\": [518, 772, 883, 794], \"label\": \"Text\"}, {\"bbox_2d\": [518, 798, 914, 935], \"label\": \"Table\"}, {\"bbox_2d\": [76, 758, 468, 817], \"label\": \"Text\"}, {\"bbox_2d\": [518, 940, 914, 962], \"label\": \"Text\"}, {\"bbox_2d\": [76, 954, 471, 968], \"label\": \"Text\"}, {\"bbox_2d\": [76, 825, 471, 907], \"label\": \"Table\"}, {\"bbox_2d\": [518, 909, 914, 935], \"label\": \"Table\"}, {\"bbox_2d\": [76, 909, 328, 919], \"label\": \"Text\"}, {\"bbox_2d\": [76, 940, 471, 968], \"label\": \"Table\"}, {\"bbox_2d\": [322, 981, 914, 991], \"label\": \"Text\"}, {\"bbox_2d\": [60, 998, 169, 1014], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [72, 149, 256, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 149, 676, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 562, 255, 572], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 541, 694, 551], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 335, 469, 391], \"label\": \"Text\"}, {\"bbox_2d\": [517, 335, 902, 356], \"label\": \"Text\"}, {\"bbox_2d\": [534, 359, 878, 368], \"label\": \"Text\"}, {\"bbox_2d\": [72, 484, 319, 493], \"label\": \"Text\"}, {\"bbox_2d\": [72, 748, 268, 758], \"label\": \"Text\"}, {\"bbox_2d\": [72, 760, 472, 817], \"label\": \"Text\"}, {\"bbox_2d\": [517, 727, 916, 772], \"label\": \"Text\"}, {\"bbox_2d\": [517, 774, 881, 795], \"label\": \"Text\"}, {\"bbox_2d\": [321, 981, 916, 991], \"label\": \"Text\"}, {\"bbox_2d\": [58, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [72, 909, 328, 918], \"label\": \"Text\"}, {\"bbox_2d\": [518, 803, 915, 882], \"label\": \"Table\"}, {\"bbox_2d\": [517, 894, 915, 936], \"label\": \"Table\"}, {\"bbox_2d\": [517, 940, 912, 961], \"label\": \"Text\"}, {\"bbox_2d\": [71, 825, 470, 905], \"label\": \"Table\"}, {\"bbox_2d\": [72, 927, 470, 968], \"label\": \"Table\"}, {\"bbox_2d\": [71, 574, 469, 744], \"label\": \"Table\"}, {\"bbox_2d\": [517, 553, 916, 722], \"label\": \"Table\"}, {\"bbox_2d\": [71, 502, 469, 543], \"label\": \"Table\"}, {\"bbox_2d\": [518, 468, 915, 522], \"label\": \"Table\"}, {\"bbox_2d\": [517, 376, 915, 455], \"label\": \"Table\"}, {\"bbox_2d\": [71, 400, 470, 479], \"label\": \"Table\"}, {\"bbox_2d\": [71, 161, 469, 332], \"label\": \"Table\"}, {\"bbox_2d\": [517, 161, 917, 330], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [60, 123, 962, 538], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [65, 123, 961, 540], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [428, 77, 608, 92], \"label\": \"Page-header\"}, {\"bbox_2d\": [251, 136, 786, 160], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 204, 294, 222], \"label\": \"Section-header\"}, {\"bbox_2d\": [389, 205, 456, 237], \"label\": \"Picture\"}, {\"bbox_2d\": [662, 198, 817, 234], \"label\": \"Picture\"}, {\"bbox_2d\": [115, 254, 371, 543], \"label\": \"Text\"}, {\"bbox_2d\": [390, 254, 647, 369], \"label\": \"Text\"}, {\"bbox_2d\": [664, 254, 922, 578], \"label\": \"Text\"}, {\"bbox_2d\": [115, 566, 303, 573], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 576, 372, 609], \"label\": \"Text\"}, {\"bbox_2d\": [115, 614, 371, 687], \"label\": \"Text\"}, {\"bbox_2d\": [390, 566, 627, 694], \"label\": \"Text\"}, {\"bbox_2d\": [664, 596, 851, 608], \"label\": \"Section-header\"}, {\"bbox_2d\": [390, 713, 577, 726], \"label\": \"Section-header\"}, {\"bbox_2d\": [664, 785, 887, 796], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 708, 234, 719], \"label\": \"Section-header\"}, {\"bbox_2d\": [390, 383, 577, 395], \"label\": \"Section-header\"}, {\"bbox_2d\": [390, 409, 648, 476], \"label\": \"Text\"}, {\"bbox_2d\": [390, 493, 600, 503], \"label\": \"Section-header\"}, {\"bbox_2d\": [390, 531, 482, 550], \"label\": \"Section-header\"}, {\"bbox_2d\": [664, 596, 920, 621], \"label\": \"Text\"}, {\"bbox_2d\": [664, 637, 839, 649], \"label\": \"Text\"}, {\"bbox_2d\": [664, 655, 921, 708], \"label\": \"Text\"}, {\"bbox_2d\": [664, 716, 924, 765], \"label\": \"Text\"}, {\"bbox_2d\": [390, 734, 648, 843], \"label\": \"Text\"}, {\"bbox_2d\": [390, 864, 515, 873], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 999, 525, 1007], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [428, 78, 607, 89], \"label\": \"Text\"}, {\"bbox_2d\": [251, 135, 784, 157], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 254, 374, 538], \"label\": \"Text\"}, {\"bbox_2d\": [115, 578, 373, 608], \"label\": \"Text\"}, {\"bbox_2d\": [115, 617, 373, 686], \"label\": \"Text\"}, {\"bbox_2d\": [115, 709, 234, 717], \"label\": \"Text\"}, {\"bbox_2d\": [390, 733, 648, 841], \"label\": \"Text\"}, {\"bbox_2d\": [390, 864, 514, 873], \"label\": \"Text\"}, {\"bbox_2d\": [390, 565, 648, 693], \"label\": \"Text\"}, {\"bbox_2d\": [390, 402, 649, 472], \"label\": \"Text\"}, {\"bbox_2d\": [390, 495, 597, 503], \"label\": \"Text\"}, {\"bbox_2d\": [390, 254, 648, 362], \"label\": \"Text\"}, {\"bbox_2d\": [665, 254, 924, 577], \"label\": \"Text\"}, {\"bbox_2d\": [665, 617, 924, 647], \"label\": \"Text\"}, {\"bbox_2d\": [665, 656, 923, 706], \"label\": \"Text\"}, {\"bbox_2d\": [665, 714, 923, 764], \"label\": \"Text\"}, {\"bbox_2d\": [665, 787, 886, 795], \"label\": \"Text\"}, {\"bbox_2d\": [665, 598, 851, 608], \"label\": \"Section-header\"}, {\"bbox_2d\": [390, 383, 576, 394], \"label\": \"Section-header\"}, {\"bbox_2d\": [390, 714, 576, 724], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 559, 301, 569], \"label\": \"Section-header\"}, {\"bbox_2d\": [512, 999, 523, 1008], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 204, 298, 220], \"label\": \"Picture\"}, {\"bbox_2d\": [388, 201, 455, 233], \"label\": \"Picture\"}, {\"bbox_2d\": [662, 198, 816, 239], \"label\": \"Picture\"}, {\"bbox_2d\": [389, 533, 480, 552], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [137, 96, 381, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 171, 271, 183], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 384, 291, 396], \"label\": \"Section-header\"}, {\"bbox_2d\": [497, 940, 537, 954], \"label\": \"Page-footer\"}, {\"bbox_2d\": [137, 120, 917, 161], \"label\": \"Text\"}, {\"bbox_2d\": [137, 195, 917, 236], \"label\": \"Text\"}, {\"bbox_2d\": [137, 325, 917, 351], \"label\": \"Text\"}, {\"bbox_2d\": [137, 407, 917, 448], \"label\": \"Text\"}, {\"bbox_2d\": [182, 355, 443, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 239, 917, 265], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 268, 917, 309], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}, {\"bbox_2d\": [137, 405, 918, 448], \"label\": \"Text\"}, {\"bbox_2d\": [137, 381, 294, 397], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 168, 270, 185], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 93, 382, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 117, 917, 161], \"label\": \"Text\"}, {\"bbox_2d\": [137, 193, 918, 237], \"label\": \"Text\"}, {\"bbox_2d\": [137, 236, 916, 266], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 265, 920, 309], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 323, 917, 352], \"label\": \"Text\"}, {\"bbox_2d\": [182, 352, 448, 367], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [120, 35, 411, 91], \"label\": \"Picture\"}, {\"bbox_2d\": [415, 208, 620, 222], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 239, 878, 930], \"label\": \"Table\"}, {\"bbox_2d\": [536, 970, 546, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [415, 207, 626, 221], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 238, 880, 929], \"label\": \"Table\"}, {\"bbox_2d\": [537, 970, 547, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [122, 34, 408, 86], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [79, 75, 976, 814], \"label\": \"Picture\"}, {\"bbox_2d\": [957, 988, 975, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [80, 908, 299, 969], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [81, 244, 215, 293], \"label\": \"Caption\"}, {\"bbox_2d\": [318, 244, 476, 293], \"label\": \"Caption\"}, {\"bbox_2d\": [556, 244, 731, 293], \"label\": \"Caption\"}, {\"bbox_2d\": [793, 244, 974, 293], \"label\": \"Caption\"}, {\"bbox_2d\": [793, 490, 952, 539], \"label\": \"Caption\"}, {\"bbox_2d\": [556, 490, 732, 552], \"label\": \"Caption\"}, {\"bbox_2d\": [318, 490, 465, 526], \"label\": \"Caption\"}, {\"bbox_2d\": [81, 490, 229, 552], \"label\": \"Caption\"}, {\"bbox_2d\": [81, 735, 249, 784], \"label\": \"Caption\"}, {\"bbox_2d\": [318, 735, 439, 771], \"label\": \"Caption\"}, {\"bbox_2d\": [556, 735, 716, 797], \"label\": \"Caption\"}, {\"bbox_2d\": [793, 735, 967, 810], \"label\": \"Caption\"}, {\"bbox_2d\": [793, 300, 824, 308], \"label\": \"Text\"}, {\"bbox_2d\": [556, 300, 586, 308], \"label\": \"Text\"}, {\"bbox_2d\": [318, 533, 377, 541], \"label\": \"Text\"}, {\"bbox_2d\": [793, 546, 855, 554], \"label\": \"Text\"}, {\"bbox_2d\": [81, 300, 145, 308], \"label\": \"Text\"}, {\"bbox_2d\": [81, 792, 111, 799], \"label\": \"Text\"}, {\"bbox_2d\": [318, 779, 379, 786], \"label\": \"Text\"}, {\"bbox_2d\": [556, 805, 618, 812], \"label\": \"Text\"}, {\"bbox_2d\": [958, 994, 973, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [81, 912, 183, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 924, 298, 931], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 935, 192, 943], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 947, 268, 955], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 959, 210, 967], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 76, 262, 235], \"label\": \"Picture\"}, {\"bbox_2d\": [319, 76, 500, 234], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 76, 738, 234], \"label\": \"Picture\"}, {\"bbox_2d\": [793, 76, 975, 235], \"label\": \"Picture\"}, {\"bbox_2d\": [794, 322, 976, 480], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 322, 738, 480], \"label\": \"Picture\"}, {\"bbox_2d\": [318, 322, 499, 480], \"label\": \"Picture\"}, {\"bbox_2d\": [80, 322, 262, 480], \"label\": \"Picture\"}, {\"bbox_2d\": [80, 567, 262, 725], \"label\": \"Picture\"}, {\"bbox_2d\": [318, 568, 499, 725], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 569, 737, 727], \"label\": \"Picture\"}, {\"bbox_2d\": [794, 568, 976, 726], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [512, 974, 523, 990], \"label\": \"Page-footer\"}, {\"bbox_2d\": [156, 88, 226, 103], \"label\": \"Text\"}, {\"bbox_2d\": [156, 148, 272, 163], \"label\": \"Text\"}, {\"bbox_2d\": [589, 117, 877, 163], \"label\": \"Text\"}, {\"bbox_2d\": [419, 192, 616, 207], \"label\": \"Section-header\"}, {\"bbox_2d\": [369, 215, 665, 230], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 239, 880, 283], \"label\": \"Text\"}, {\"bbox_2d\": [156, 293, 880, 367], \"label\": \"Text\"}, {\"bbox_2d\": [156, 377, 880, 451], \"label\": \"Text\"}, {\"bbox_2d\": [156, 461, 880, 490], \"label\": \"Text\"}, {\"bbox_2d\": [156, 500, 880, 530], \"label\": \"Text\"}, {\"bbox_2d\": [156, 558, 295, 570], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 576, 880, 603], \"label\": \"Footnote\"}]```", "label": "```json[{\"bbox_2d\": [589, 121, 881, 162], \"label\": \"Text\"}, {\"bbox_2d\": [156, 150, 277, 162], \"label\": \"Text\"}, {\"bbox_2d\": [156, 91, 230, 103], \"label\": \"Text\"}, {\"bbox_2d\": [419, 195, 621, 207], \"label\": \"Section-header\"}, {\"bbox_2d\": [369, 217, 671, 229], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 242, 884, 283], \"label\": \"Text\"}, {\"bbox_2d\": [156, 296, 884, 367], \"label\": \"Text\"}, {\"bbox_2d\": [156, 380, 884, 451], \"label\": \"Text\"}, {\"bbox_2d\": [156, 464, 884, 491], \"label\": \"Text\"}, {\"bbox_2d\": [156, 504, 884, 530], \"label\": \"Text\"}, {\"bbox_2d\": [156, 562, 300, 571], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 580, 883, 603], \"label\": \"Footnote\"}, {\"bbox_2d\": [513, 979, 522, 991], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 139, 496, 154], \"label\": \"Page-header\"}, {\"bbox_2d\": [131, 232, 510, 314], \"label\": \"Text\"}, {\"bbox_2d\": [130, 345, 525, 655], \"label\": \"Table\"}, {\"bbox_2d\": [555, 524, 947, 608], \"label\": \"Table\"}, {\"bbox_2d\": [131, 680, 522, 896], \"label\": \"Text\"}, {\"bbox_2d\": [555, 746, 935, 896], \"label\": \"Text\"}, {\"bbox_2d\": [555, 636, 935, 694], \"label\": \"Text\"}, {\"bbox_2d\": [555, 725, 819, 739], \"label\": \"Section-header\"}, {\"bbox_2d\": [555, 389, 946, 493], \"label\": \"Text\"}, {\"bbox_2d\": [555, 232, 945, 358], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [91, 132, 497, 156], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 229, 511, 314], \"label\": \"Text\"}, {\"bbox_2d\": [130, 677, 523, 896], \"label\": \"Text\"}, {\"bbox_2d\": [555, 744, 937, 896], \"label\": \"Text\"}, {\"bbox_2d\": [555, 632, 938, 695], \"label\": \"Text\"}, {\"bbox_2d\": [555, 386, 947, 493], \"label\": \"Text\"}, {\"bbox_2d\": [555, 229, 945, 359], \"label\": \"Text\"}, {\"bbox_2d\": [555, 723, 820, 740], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 342, 528, 657], \"label\": \"Table\"}, {\"bbox_2d\": [562, 518, 951, 609], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [538, 132, 908, 156], \"label\": \"Page-header\"}, {\"bbox_2d\": [1016, 137, 1029, 156], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 230, 331, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [512, 230, 765, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 255, 483, 294], \"label\": \"Text\"}, {\"bbox_2d\": [512, 255, 908, 384], \"label\": \"Text\"}, {\"bbox_2d\": [88, 545, 475, 585], \"label\": \"Text\"}, {\"bbox_2d\": [88, 612, 483, 898], \"label\": \"Text\"}, {\"bbox_2d\": [512, 523, 908, 720], \"label\": \"Text\"}, {\"bbox_2d\": [512, 410, 904, 496], \"label\": \"Text\"}, {\"bbox_2d\": [512, 746, 904, 919], \"label\": \"Text\"}, {\"bbox_2d\": [97, 322, 480, 520], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [539, 132, 907, 156], \"label\": \"Page-header\"}, {\"bbox_2d\": [1016, 137, 1028, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 231, 332, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [512, 231, 766, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 253, 480, 293], \"label\": \"Text\"}, {\"bbox_2d\": [87, 544, 467, 584], \"label\": \"Text\"}, {\"bbox_2d\": [87, 611, 479, 897], \"label\": \"Text\"}, {\"bbox_2d\": [512, 745, 901, 920], \"label\": \"Text\"}, {\"bbox_2d\": [512, 521, 901, 718], \"label\": \"Text\"}, {\"bbox_2d\": [512, 409, 898, 495], \"label\": \"Text\"}, {\"bbox_2d\": [512, 253, 903, 383], \"label\": \"Text\"}, {\"bbox_2d\": [98, 319, 477, 520], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 70, 951, 107], \"label\": \"Text\"}, {\"bbox_2d\": [91, 273, 506, 367], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 602, 506, 710], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 174, 951, 248], \"label\": \"Text\"}, {\"bbox_2d\": [536, 370, 951, 439], \"label\": \"Text\"}, {\"bbox_2d\": [536, 599, 951, 717], \"label\": \"Text\"}, {\"bbox_2d\": [91, 850, 506, 935], \"label\": \"Text\"}, {\"bbox_2d\": [536, 850, 951, 935], \"label\": \"Text\"}, {\"bbox_2d\": [621, 149, 951, 174], \"label\": \"Formula\"}, {\"bbox_2d\": [621, 267, 951, 291], \"label\": \"Formula\"}, {\"bbox_2d\": [621, 297, 951, 322], \"label\": \"Formula\"}, {\"bbox_2d\": [621, 328, 951, 353], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 460, 951, 484], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 488, 951, 512], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 520, 566, 533], \"label\": \"Text\"}, {\"bbox_2d\": [567, 542, 951, 576], \"label\": \"Formula\"}, {\"bbox_2d\": [253, 777, 951, 817], \"label\": \"Formula\"}, {\"bbox_2d\": [108, 64, 499, 261], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 397, 506, 585], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 951, 108], \"label\": \"Text\"}, {\"bbox_2d\": [536, 171, 951, 246], \"label\": \"Text\"}, {\"bbox_2d\": [536, 371, 951, 444], \"label\": \"Text\"}, {\"bbox_2d\": [91, 271, 506, 382], \"label\": \"Caption\"}, {\"bbox_2d\": [95, 66, 501, 258], \"label\": \"Picture\"}, {\"bbox_2d\": [631, 115, 951, 164], \"label\": \"Formula\"}, {\"bbox_2d\": [618, 262, 951, 293], \"label\": \"Formula\"}, {\"bbox_2d\": [620, 294, 951, 325], \"label\": \"Formula\"}, {\"bbox_2d\": [619, 326, 951, 357], \"label\": \"Formula\"}, {\"bbox_2d\": [571, 462, 951, 484], \"label\": \"Formula\"}, {\"bbox_2d\": [572, 484, 951, 507], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 522, 563, 535], \"label\": \"Text\"}, {\"bbox_2d\": [568, 548, 951, 590], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 598, 951, 716], \"label\": \"Text\"}, {\"bbox_2d\": [91, 603, 506, 693], \"label\": \"Caption\"}, {\"bbox_2d\": [90, 393, 503, 589], \"label\": \"Picture\"}, {\"bbox_2d\": [252, 773, 951, 815], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 850, 951, 917], \"label\": \"Text\"}, {\"bbox_2d\": [91, 850, 506, 947], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [429, 29, 613, 39], \"label\": \"Page-header\"}, {\"bbox_2d\": [111, 98, 411, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 294, 395, 309], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 599, 285, 615], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 143, 611, 181], \"label\": \"Text\"}, {\"bbox_2d\": [111, 210, 561, 248], \"label\": \"Text\"}, {\"bbox_2d\": [111, 338, 632, 376], \"label\": \"Text\"}, {\"bbox_2d\": [111, 405, 884, 466], \"label\": \"Text\"}, {\"bbox_2d\": [111, 494, 884, 555], \"label\": \"Text\"}, {\"bbox_2d\": [111, 645, 843, 682], \"label\": \"Text\"}, {\"bbox_2d\": [111, 690, 436, 705], \"label\": \"Text\"}, {\"bbox_2d\": [111, 734, 851, 771], \"label\": \"Text\"}, {\"bbox_2d\": [111, 779, 894, 816], \"label\": \"Text\"}, {\"bbox_2d\": [111, 845, 323, 860], \"label\": \"Text\"}, {\"bbox_2d\": [111, 868, 700, 881], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [110, 92, 412, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [110, 288, 395, 311], \"label\": \"Section-header\"}, {\"bbox_2d\": [110, 594, 285, 618], \"label\": \"Section-header\"}, {\"bbox_2d\": [110, 640, 844, 706], \"label\": \"Text\"}, {\"bbox_2d\": [110, 729, 901, 817], \"label\": \"Text\"}, {\"bbox_2d\": [110, 839, 701, 883], \"label\": \"Text\"}, {\"bbox_2d\": [110, 489, 892, 555], \"label\": \"Text\"}, {\"bbox_2d\": [110, 400, 885, 466], \"label\": \"Text\"}, {\"bbox_2d\": [110, 333, 633, 377], \"label\": \"Text\"}, {\"bbox_2d\": [110, 204, 601, 248], \"label\": \"Text\"}, {\"bbox_2d\": [110, 138, 612, 182], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 910, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 912, 427, 923], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [228, 139, 927, 909], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 916, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 138, 929, 908], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 912, 427, 923], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [272, 86, 912, 267], \"label\": \"Table\"}, {\"bbox_2d\": [272, 328, 862, 359], \"label\": \"Text\"}, {\"bbox_2d\": [272, 364, 844, 395], \"label\": \"List-item\"}, {\"bbox_2d\": [272, 414, 490, 428], \"label\": \"Text\"}, {\"bbox_2d\": [272, 757, 791, 772], \"label\": \"Text\"}, {\"bbox_2d\": [272, 781, 623, 794], \"label\": \"Caption\"}, {\"bbox_2d\": [272, 797, 912, 935], \"label\": \"Table\"}, {\"bbox_2d\": [272, 439, 911, 731], \"label\": \"Picture\"}, {\"bbox_2d\": [99, 299, 271, 318], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [271, 86, 913, 268], \"label\": \"Table\"}, {\"bbox_2d\": [466, 440, 648, 452], \"label\": \"Section-header\"}, {\"bbox_2d\": [297, 463, 790, 710], \"label\": \"Text\"}, {\"bbox_2d\": [273, 330, 862, 358], \"label\": \"Text\"}, {\"bbox_2d\": [273, 416, 489, 429], \"label\": \"Text\"}, {\"bbox_2d\": [273, 759, 791, 772], \"label\": \"Text\"}, {\"bbox_2d\": [272, 784, 625, 795], \"label\": \"Caption\"}, {\"bbox_2d\": [273, 363, 844, 378], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 384, 388, 398], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 301, 271, 318], \"label\": \"Section-header\"}, {\"bbox_2d\": [271, 796, 911, 934], \"label\": \"Table\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 97, 514, 110], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 126, 883, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 154, 537, 166], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 182, 576, 195], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 211, 911, 251], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 267, 911, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 296, 911, 336], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 352, 911, 392], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 408, 833, 421], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 437, 788, 449], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 465, 812, 477], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 494, 875, 506], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 522, 734, 534], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 550, 911, 563], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 579, 911, 620], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 636, 708, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 664, 565, 676], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 692, 660, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 721, 804, 733], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 749, 911, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 806, 800, 818], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 834, 911, 875], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 891, 746, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 920, 817, 932], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 948, 779, 961], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 514, 110], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 124, 884, 139], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 153, 652, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 181, 688, 195], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 209, 910, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 266, 907, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 294, 910, 337], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 351, 910, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 408, 831, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 436, 788, 450], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 464, 812, 479], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 493, 877, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 521, 763, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 549, 910, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 578, 910, 620], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 635, 708, 649], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 663, 567, 677], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 691, 661, 706], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 720, 804, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 748, 910, 791], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 805, 799, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 833, 910, 876], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 890, 747, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 918, 817, 932], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 946, 779, 961], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 414, 907, 490], \"label\": \"Text\"}, {\"bbox_2d\": [231, 504, 564, 520], \"label\": \"Text\"}, {\"bbox_2d\": [231, 526, 418, 541], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 542, 451, 557], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 558, 622, 573], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 574, 387, 588], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 590, 340, 605], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 606, 591, 621], \"label\": \"List-item\"}, {\"bbox_2d\": [907, 984, 926, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [109, 987, 477, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 329, 630, 363], \"label\": \"Section-header\"}, {\"bbox_2d\": [56, 90, 409, 311], \"label\": \"Picture\"}, {\"bbox_2d\": [848, 120, 883, 162], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [847, 121, 884, 170], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 333, 630, 364], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 415, 911, 490], \"label\": \"Text\"}, {\"bbox_2d\": [231, 507, 564, 519], \"label\": \"Text\"}, {\"bbox_2d\": [231, 529, 417, 541], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 545, 450, 557], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 560, 622, 572], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 576, 387, 588], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 592, 339, 604], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 607, 591, 619], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 988, 469, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [907, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [58, 93, 400, 310], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [0, 0, 1035, 1035], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [0, 0, 1035, 1033], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [690, 71, 974, 115], \"label\": \"Page-header\"}, {\"bbox_2d\": [566, 130, 973, 162], \"label\": \"Text\"}, {\"bbox_2d\": [113, 129, 524, 479], \"label\": \"Text\"}, {\"bbox_2d\": [563, 169, 974, 554], \"label\": \"Text\"}, {\"bbox_2d\": [112, 483, 523, 815], \"label\": \"Text\"}, {\"bbox_2d\": [562, 558, 974, 665], \"label\": \"Text\"}, {\"bbox_2d\": [578, 669, 974, 741], \"label\": \"Text\"}, {\"bbox_2d\": [579, 745, 974, 924], \"label\": \"Text\"}, {\"bbox_2d\": [111, 852, 523, 980], \"label\": \"Text\"}, {\"bbox_2d\": [111, 827, 256, 844], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 991, 557, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [579, 931, 974, 983], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [691, 68, 987, 115], \"label\": \"Page-header\"}, {\"bbox_2d\": [567, 126, 983, 163], \"label\": \"Text\"}, {\"bbox_2d\": [114, 126, 553, 477], \"label\": \"Text\"}, {\"bbox_2d\": [563, 164, 987, 551], \"label\": \"Text\"}, {\"bbox_2d\": [111, 479, 533, 812], \"label\": \"Text\"}, {\"bbox_2d\": [562, 552, 988, 663], \"label\": \"Text\"}, {\"bbox_2d\": [582, 665, 982, 737], \"label\": \"Text\"}, {\"bbox_2d\": [581, 738, 986, 922], \"label\": \"Text\"}, {\"bbox_2d\": [581, 924, 982, 978], \"label\": \"Text\"}, {\"bbox_2d\": [110, 847, 532, 977], \"label\": \"Text\"}, {\"bbox_2d\": [111, 824, 263, 841], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 27, 389, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 89, 948, 268], \"label\": \"Table\"}, {\"bbox_2d\": [91, 275, 635, 284], \"label\": \"Text\"}, {\"bbox_2d\": [91, 285, 847, 294], \"label\": \"Text\"}, {\"bbox_2d\": [91, 306, 945, 331], \"label\": \"Text\"}, {\"bbox_2d\": [527, 339, 946, 435], \"label\": \"Table\"}, {\"bbox_2d\": [527, 449, 945, 503], \"label\": \"Text\"}, {\"bbox_2d\": [91, 464, 507, 546], \"label\": \"Text\"}, {\"bbox_2d\": [91, 575, 507, 671], \"label\": \"Text\"}, {\"bbox_2d\": [91, 679, 507, 736], \"label\": \"Text\"}, {\"bbox_2d\": [527, 638, 945, 722], \"label\": \"Text\"}, {\"bbox_2d\": [527, 502, 768, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 558, 275, 569], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 449, 173, 460], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 528, 945, 628], \"label\": \"Text\"}, {\"bbox_2d\": [527, 734, 945, 801], \"label\": \"Text\"}, {\"bbox_2d\": [527, 811, 945, 866], \"label\": \"Text\"}, {\"bbox_2d\": [91, 746, 507, 843], \"label\": \"Text\"}, {\"bbox_2d\": [91, 999, 451, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 389, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 274, 637, 283], \"label\": \"Text\"}, {\"bbox_2d\": [91, 286, 846, 294], \"label\": \"Text\"}, {\"bbox_2d\": [91, 307, 944, 331], \"label\": \"Text\"}, {\"bbox_2d\": [91, 464, 507, 546], \"label\": \"Text\"}, {\"bbox_2d\": [528, 449, 944, 503], \"label\": \"Text\"}, {\"bbox_2d\": [91, 572, 507, 669], \"label\": \"Text\"}, {\"bbox_2d\": [91, 681, 507, 734], \"label\": \"Text\"}, {\"bbox_2d\": [91, 761, 508, 843], \"label\": \"Text\"}, {\"bbox_2d\": [528, 812, 944, 865], \"label\": \"Text\"}, {\"bbox_2d\": [528, 732, 944, 800], \"label\": \"Text\"}, {\"bbox_2d\": [528, 638, 944, 720], \"label\": \"Text\"}, {\"bbox_2d\": [528, 529, 944, 626], \"label\": \"Text\"}, {\"bbox_2d\": [91, 1000, 451, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 746, 270, 756], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 558, 275, 568], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 515, 768, 525], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 449, 174, 460], \"label\": \"Section-header\"}, {\"bbox_2d\": [90, 88, 954, 267], \"label\": \"Table\"}, {\"bbox_2d\": [91, 339, 944, 435], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [429, 27, 613, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [400, 82, 638, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 146, 930, 174], \"label\": \"Section-header\"}, {\"bbox_2d\": [99, 203, 928, 268], \"label\": \"Text\"}, {\"bbox_2d\": [98, 416, 960, 524], \"label\": \"Text\"}, {\"bbox_2d\": [98, 543, 863, 586], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 598, 692, 616], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 622, 467, 641], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 647, 475, 665], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 671, 721, 690], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 728, 814, 963], \"label\": \"Picture\"}, {\"bbox_2d\": [99, 272, 499, 290], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 296, 388, 313], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 319, 448, 336], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 341, 574, 357], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 363, 601, 380], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [399, 75, 642, 125], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 142, 930, 175], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 541, 862, 587], \"label\": \"Section-header\"}, {\"bbox_2d\": [209, 726, 814, 965], \"label\": \"Picture\"}, {\"bbox_2d\": [103, 202, 931, 268], \"label\": \"Text\"}, {\"bbox_2d\": [98, 414, 964, 524], \"label\": \"Text\"}, {\"bbox_2d\": [98, 595, 375, 617], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 620, 462, 642], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 645, 472, 667], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 669, 718, 691], \"label\": \"List-item\"}, {\"bbox_2d\": [103, 269, 495, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [103, 291, 383, 313], \"label\": \"List-item\"}, {\"bbox_2d\": [103, 313, 445, 335], \"label\": \"List-item\"}, {\"bbox_2d\": [103, 335, 570, 357], \"label\": \"List-item\"}, {\"bbox_2d\": [103, 358, 595, 379], \"label\": \"List-item\"}, {\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [730, 57, 936, 86], \"label\": \"Page-header\"}, {\"bbox_2d\": [492, 994, 522, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [251, 887, 768, 901], \"label\": \"Caption\"}, {\"bbox_2d\": [158, 126, 860, 862], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [731, 55, 949, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [159, 125, 859, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [251, 882, 772, 899], \"label\": \"Caption\"}, {\"bbox_2d\": [494, 988, 527, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [90, 45, 900, 60], \"label\": \"Section-header\"}, {\"bbox_2d\": [90, 716, 348, 726], \"label\": \"Text\"}, {\"bbox_2d\": [511, 987, 528, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [88, 139, 941, 712], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [510, 988, 527, 1005], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 47, 900, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 717, 349, 725], \"label\": \"Text\"}, {\"bbox_2d\": [91, 141, 944, 709], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [789, 74, 893, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 525, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 74, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [542, 155, 756, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [157, 155, 491, 953], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 92, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [790, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [158, 153, 489, 953], \"label\": \"Picture\"}, {\"bbox_2d\": [551, 152, 768, 970], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [92, 44, 378, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [126, 115, 902, 470], \"label\": \"Picture\"}, {\"bbox_2d\": [126, 560, 832, 941], \"label\": \"Picture\"}, {\"bbox_2d\": [297, 947, 726, 965], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 988, 526, 996], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [93, 42, 389, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [114, 112, 906, 939], \"label\": \"Picture\"}, {\"bbox_2d\": [299, 943, 734, 959], \"label\": \"Caption\"}, {\"bbox_2d\": [499, 981, 532, 996], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [492, 45, 557, 59], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 76, 512, 90], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 107, 570, 117], \"label\": \"Text\"}, {\"bbox_2d\": [86, 134, 291, 146], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 306, 182, 318], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 333, 952, 344], \"label\": \"Text\"}, {\"bbox_2d\": [86, 362, 932, 389], \"label\": \"Text\"}, {\"bbox_2d\": [801, 999, 965, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [120, 174, 734, 287], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [493, 38, 558, 61], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 69, 512, 92], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 128, 291, 147], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 155, 735, 289], \"label\": \"Table\"}, {\"bbox_2d\": [85, 102, 571, 118], \"label\": \"Text\"}, {\"bbox_2d\": [85, 358, 936, 389], \"label\": \"Text\"}, {\"bbox_2d\": [85, 330, 952, 345], \"label\": \"Text\"}, {\"bbox_2d\": [85, 300, 183, 319], \"label\": \"Section-header\"}, {\"bbox_2d\": [802, 998, 965, 1009], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 887, 113], \"label\": \"Text\"}, {\"bbox_2d\": [121, 229, 911, 275], \"label\": \"Text\"}, {\"bbox_2d\": [121, 343, 911, 386], \"label\": \"Text\"}, {\"bbox_2d\": [121, 459, 911, 503], \"label\": \"Text\"}, {\"bbox_2d\": [121, 570, 492, 587], \"label\": \"Text\"}, {\"bbox_2d\": [121, 629, 170, 646], \"label\": \"Text\"}, {\"bbox_2d\": [121, 730, 317, 747], \"label\": \"Text\"}, {\"bbox_2d\": [121, 857, 911, 958], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [341, 591, 911, 632], \"label\": \"Formula\"}, {\"bbox_2d\": [243, 666, 911, 725], \"label\": \"Formula\"}, {\"bbox_2d\": [312, 762, 911, 822], \"label\": \"Formula\"}, {\"bbox_2d\": [312, 804, 911, 864], \"label\": \"Formula\"}, {\"bbox_2d\": [333, 375, 911, 446], \"label\": \"Formula\"}, {\"bbox_2d\": [397, 299, 911, 317], \"label\": \"Formula\"}, {\"bbox_2d\": [274, 127, 911, 217], \"label\": \"Formula\"}, {\"bbox_2d\": [162, 514, 911, 564], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 885, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 229, 910, 272], \"label\": \"Text\"}, {\"bbox_2d\": [121, 340, 911, 385], \"label\": \"Text\"}, {\"bbox_2d\": [121, 458, 910, 501], \"label\": \"Text\"}, {\"bbox_2d\": [121, 571, 520, 587], \"label\": \"Text\"}, {\"bbox_2d\": [121, 657, 172, 673], \"label\": \"Text\"}, {\"bbox_2d\": [121, 731, 430, 747], \"label\": \"Text\"}, {\"bbox_2d\": [121, 860, 911, 958], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [274, 127, 911, 222], \"label\": \"Formula\"}, {\"bbox_2d\": [424, 299, 911, 326], \"label\": \"Formula\"}, {\"bbox_2d\": [153, 515, 911, 574], \"label\": \"Formula\"}, {\"bbox_2d\": [451, 612, 911, 630], \"label\": \"Formula\"}, {\"bbox_2d\": [307, 761, 911, 843], \"label\": \"Formula\"}, {\"bbox_2d\": [332, 402, 911, 456], \"label\": \"Formula\"}, {\"bbox_2d\": [245, 670, 911, 750], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [97, 67, 114, 83], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 126, 915, 154], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 167, 302, 199], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 212, 877, 227], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 240, 419, 255], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 282, 877, 314], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 327, 483, 341], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 354, 927, 425], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 438, 479, 467], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 480, 941, 512], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 525, 114, 540], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 579, 941, 609], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 622, 114, 637], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 677, 545, 692], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 705, 569, 723], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 735, 956, 766], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 778, 666, 794], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 807, 542, 822], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 835, 186, 850], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 873, 943, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 916, 114, 931], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 972, 916, 988], \"label\": \"Page-footer\"}, {\"bbox_2d\": [960, 972, 1016, 988], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [98, 70, 121, 83], \"label\": \"Text\"}, {\"bbox_2d\": [98, 127, 919, 155], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 170, 308, 198], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 212, 881, 226], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 241, 418, 254], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 283, 879, 311], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 326, 483, 339], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 354, 938, 424], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 439, 480, 467], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 481, 944, 509], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 580, 946, 608], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 679, 549, 693], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 707, 570, 721], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 736, 959, 764], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 779, 667, 792], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 807, 547, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 835, 187, 849], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 873, 948, 901], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 973, 1016, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 915, 114, 929], \"label\": \"Text\"}, {\"bbox_2d\": [98, 623, 118, 636], \"label\": \"Text\"}, {\"bbox_2d\": [98, 524, 123, 537], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 83, 914, 113], \"label\": \"Text\"}, {\"bbox_2d\": [293, 517, 636, 530], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 552, 785, 843], \"label\": \"Picture\"}, {\"bbox_2d\": [293, 856, 596, 869], \"label\": \"Caption\"}, {\"bbox_2d\": [598, 979, 936, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [292, 135, 933, 507], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 910, 114], \"label\": \"Text\"}, {\"bbox_2d\": [598, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 857, 592, 868], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 519, 633, 530], \"label\": \"Caption\"}, {\"bbox_2d\": [295, 139, 931, 504], \"label\": \"Picture\"}, {\"bbox_2d\": [293, 554, 779, 843], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [786, 74, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 523, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [677, 145, 746, 154], \"label\": \"Text\"}, {\"bbox_2d\": [282, 145, 351, 154], \"label\": \"Text\"}, {\"bbox_2d\": [190, 166, 425, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 172, 842, 970], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [786, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 92, 523, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 314, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [553, 143, 861, 970], \"label\": \"Picture\"}, {\"bbox_2d\": [188, 143, 430, 971], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [786, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [200, 143, 414, 965], \"label\": \"Picture\"}, {\"bbox_2d\": [604, 143, 813, 930], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 958, 894, 983], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [786, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 525, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [526, 958, 894, 982], \"label\": \"Text\"}, {\"bbox_2d\": [607, 142, 809, 927], \"label\": \"Picture\"}, {\"bbox_2d\": [211, 143, 411, 970], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [92, 347, 594, 501], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 183, 273, 269], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [92, 342, 593, 504], \"label\": \"Title\"}, {\"bbox_2d\": [90, 183, 276, 266], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [89, 78, 989, 984], \"label\": \"Table\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [86, 77, 990, 981], \"label\": \"Table\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [34, 200, 61, 770], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 92, 901, 145], \"label\": \"Title\"}, {\"bbox_2d\": [253, 175, 778, 275], \"label\": \"Text\"}, {\"bbox_2d\": [409, 289, 622, 303], \"label\": \"Text\"}, {\"bbox_2d\": [121, 352, 911, 678], \"label\": \"Text\"}, {\"bbox_2d\": [471, 321, 562, 338], \"label\": \"Section-header\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [130, 94, 902, 146], \"label\": \"Title\"}, {\"bbox_2d\": [254, 172, 777, 189], \"label\": \"Text\"}, {\"bbox_2d\": [270, 203, 762, 275], \"label\": \"Text\"}, {\"bbox_2d\": [409, 289, 623, 304], \"label\": \"Text\"}, {\"bbox_2d\": [121, 352, 911, 678], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [471, 320, 562, 339], \"label\": \"Section-header\"}, {\"bbox_2d\": [31, 199, 61, 771], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 27, 438, 61], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 92, 962, 170], \"label\": \"Text\"}, {\"bbox_2d\": [85, 184, 943, 230], \"label\": \"Text\"}, {\"bbox_2d\": [85, 245, 944, 275], \"label\": \"Text\"}, {\"bbox_2d\": [85, 289, 967, 349], \"label\": \"Text\"}, {\"bbox_2d\": [85, 364, 935, 394], \"label\": \"Text\"}, {\"bbox_2d\": [85, 423, 957, 517], \"label\": \"Text\"}, {\"bbox_2d\": [85, 532, 965, 624], \"label\": \"Text\"}, {\"bbox_2d\": [85, 638, 469, 652], \"label\": \"Text\"}, {\"bbox_2d\": [85, 407, 253, 422], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 835, 961, 881], \"label\": \"Text\"}, {\"bbox_2d\": [85, 777, 461, 788], \"label\": \"Text\"}, {\"bbox_2d\": [85, 791, 923, 827], \"label\": \"Text\"}, {\"bbox_2d\": [85, 761, 963, 773], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 753, 962, 770], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 665, 960, 752], \"label\": \"Table\"}, {\"bbox_2d\": [801, 997, 965, 1005], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [85, 22, 210, 45], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 41, 439, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 62, 628, 78], \"label\": \"Text\"}, {\"bbox_2d\": [85, 403, 253, 422], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 92, 966, 171], \"label\": \"Text\"}, {\"bbox_2d\": [85, 183, 943, 230], \"label\": \"Text\"}, {\"bbox_2d\": [85, 243, 944, 274], \"label\": \"Text\"}, {\"bbox_2d\": [85, 286, 966, 349], \"label\": \"Text\"}, {\"bbox_2d\": [85, 362, 936, 393], \"label\": \"Text\"}, {\"bbox_2d\": [85, 421, 956, 516], \"label\": \"Text\"}, {\"bbox_2d\": [85, 529, 967, 623], \"label\": \"Text\"}, {\"bbox_2d\": [85, 636, 471, 651], \"label\": \"Text\"}, {\"bbox_2d\": [85, 775, 463, 787], \"label\": \"Text\"}, {\"bbox_2d\": [85, 788, 921, 825], \"label\": \"Text\"}, {\"bbox_2d\": [85, 833, 960, 880], \"label\": \"Text\"}, {\"bbox_2d\": [82, 660, 967, 775], \"label\": \"Table\"}, {\"bbox_2d\": [802, 996, 965, 1006], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [99, 82, 914, 471], \"label\": \"Table\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [99, 84, 929, 471], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 83, 576, 137], \"label\": \"Text\"}, {\"bbox_2d\": [274, 603, 467, 618], \"label\": \"Caption\"}, {\"bbox_2d\": [274, 189, 877, 593], \"label\": \"Picture\"}, {\"bbox_2d\": [101, 977, 378, 992], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [273, 83, 577, 96], \"label\": \"Text\"}, {\"bbox_2d\": [272, 606, 468, 617], \"label\": \"Caption\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 189, 878, 593], \"label\": \"Picture\"}, {\"bbox_2d\": [281, 109, 532, 122], \"label\": \"Text\"}, {\"bbox_2d\": [281, 122, 456, 135], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [148, 182, 818, 198], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 310, 885, 351], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 387, 887, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 489, 887, 555], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 591, 887, 733], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 769, 887, 836], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 870, 887, 911], \"label\": \"List-item\"}, {\"bbox_2d\": [260, 208, 880, 275], \"label\": \"Text\"}, {\"bbox_2d\": [148, 108, 230, 123], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [148, 108, 231, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [150, 184, 818, 198], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 211, 883, 274], \"label\": \"Text\"}, {\"bbox_2d\": [148, 311, 887, 351], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 387, 887, 453], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 489, 887, 554], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 591, 888, 732], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 769, 888, 834], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 871, 887, 911], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [149, 93, 910, 182], \"label\": \"Text\"}, {\"bbox_2d\": [150, 194, 910, 359], \"label\": \"Text\"}, {\"bbox_2d\": [150, 372, 911, 563], \"label\": \"Text\"}, {\"bbox_2d\": [150, 576, 910, 689], \"label\": \"Text\"}, {\"bbox_2d\": [150, 702, 910, 868], \"label\": \"Text\"}, {\"bbox_2d\": [150, 881, 910, 945], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [150, 92, 910, 182], \"label\": \"Text\"}, {\"bbox_2d\": [150, 194, 912, 360], \"label\": \"Text\"}, {\"bbox_2d\": [150, 372, 912, 564], \"label\": \"Text\"}, {\"bbox_2d\": [150, 576, 910, 691], \"label\": \"Text\"}, {\"bbox_2d\": [150, 703, 910, 869], \"label\": \"Text\"}, {\"bbox_2d\": [150, 882, 912, 945], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [786, 74, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [503, 93, 527, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 74, 314, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [172, 141, 453, 971], \"label\": \"Picture\"}, {\"bbox_2d\": [574, 141, 835, 967], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [786, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [578, 143, 844, 972], \"label\": \"Picture\"}, {\"bbox_2d\": [174, 142, 452, 970], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 837, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 123, 911, 494], \"label\": \"Text\"}, {\"bbox_2d\": [121, 506, 911, 633], \"label\": \"Text\"}, {\"bbox_2d\": [121, 644, 911, 823], \"label\": \"Text\"}, {\"bbox_2d\": [121, 834, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [901, 49, 911, 65], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [901, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 95, 836, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 122, 911, 494], \"label\": \"Text\"}, {\"bbox_2d\": [121, 505, 911, 642], \"label\": \"Text\"}, {\"bbox_2d\": [121, 642, 911, 740], \"label\": \"Text\"}, {\"bbox_2d\": [121, 752, 911, 822], \"label\": \"Text\"}, {\"bbox_2d\": [121, 834, 911, 959], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [786, 67, 969, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [559, 111, 977, 596], \"label\": \"Text\"}, {\"bbox_2d\": [559, 598, 980, 922], \"label\": \"Text\"}, {\"bbox_2d\": [109, 889, 524, 983], \"label\": \"Text\"}, {\"bbox_2d\": [109, 405, 523, 892], \"label\": \"Text\"}, {\"bbox_2d\": [107, 110, 517, 298], \"label\": \"Text\"}, {\"bbox_2d\": [107, 297, 517, 334], \"label\": \"Text\"}, {\"bbox_2d\": [536, 985, 552, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [562, 923, 977, 983], \"label\": \"Text\"}, {\"bbox_2d\": [237, 347, 384, 389], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [785, 68, 985, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [105, 111, 526, 296], \"label\": \"Text\"}, {\"bbox_2d\": [105, 297, 532, 334], \"label\": \"Text\"}, {\"bbox_2d\": [107, 404, 539, 890], \"label\": \"Text\"}, {\"bbox_2d\": [111, 891, 532, 984], \"label\": \"Text\"}, {\"bbox_2d\": [535, 993, 558, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [558, 110, 980, 316], \"label\": \"Text\"}, {\"bbox_2d\": [560, 325, 971, 354], \"label\": \"Text\"}, {\"bbox_2d\": [560, 354, 986, 597], \"label\": \"Text\"}, {\"bbox_2d\": [561, 598, 988, 728], \"label\": \"Text\"}, {\"bbox_2d\": [562, 732, 984, 755], \"label\": \"Text\"}, {\"bbox_2d\": [563, 767, 987, 922], \"label\": \"Text\"}, {\"bbox_2d\": [565, 924, 986, 982], \"label\": \"Text\"}, {\"bbox_2d\": [221, 339, 394, 397], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [780, 74, 893, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 143, 500, 428], \"label\": \"Text\"}, {\"bbox_2d\": [526, 143, 896, 379], \"label\": \"Text\"}, {\"bbox_2d\": [154, 471, 500, 584], \"label\": \"Picture\"}, {\"bbox_2d\": [130, 690, 500, 744], \"label\": \"Text\"}, {\"bbox_2d\": [154, 656, 459, 678], \"label\": \"Text\"}, {\"bbox_2d\": [130, 787, 500, 980], \"label\": \"Text\"}, {\"bbox_2d\": [526, 797, 895, 852], \"label\": \"Text\"}, {\"bbox_2d\": [526, 859, 896, 982], \"label\": \"Text\"}, {\"bbox_2d\": [565, 755, 856, 778], \"label\": \"Text\"}, {\"bbox_2d\": [154, 763, 473, 773], \"label\": \"Text\"}, {\"bbox_2d\": [555, 409, 895, 546], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 566, 813, 693], \"label\": \"Picture\"}, {\"bbox_2d\": [604, 704, 756, 714], \"label\": \"Caption\"}, {\"bbox_2d\": [214, 595, 364, 602], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [779, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 92, 523, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [526, 142, 896, 377], \"label\": \"Text\"}, {\"bbox_2d\": [130, 691, 500, 744], \"label\": \"Text\"}, {\"bbox_2d\": [130, 785, 503, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 862, 895, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 796, 895, 851], \"label\": \"Text\"}, {\"bbox_2d\": [564, 755, 860, 779], \"label\": \"Section-header\"}, {\"bbox_2d\": [158, 763, 472, 773], \"label\": \"Section-header\"}, {\"bbox_2d\": [173, 656, 461, 679], \"label\": \"Section-header\"}, {\"bbox_2d\": [554, 409, 896, 693], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 471, 501, 585], \"label\": \"Picture\"}, {\"bbox_2d\": [606, 704, 758, 712], \"label\": \"Caption\"}, {\"bbox_2d\": [216, 594, 362, 601], \"label\": \"Caption\"}, {\"bbox_2d\": [130, 142, 500, 181], \"label\": \"Text\"}, {\"bbox_2d\": [130, 187, 503, 427], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [321, 46, 583, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [321, 79, 904, 158], \"label\": \"Text\"}, {\"bbox_2d\": [321, 168, 918, 207], \"label\": \"Text\"}, {\"bbox_2d\": [321, 221, 918, 301], \"label\": \"Text\"}, {\"bbox_2d\": [321, 312, 903, 369], \"label\": \"Text\"}, {\"bbox_2d\": [321, 380, 697, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [321, 395, 908, 445], \"label\": \"Text\"}, {\"bbox_2d\": [321, 456, 918, 566], \"label\": \"Text\"}, {\"bbox_2d\": [321, 578, 903, 605], \"label\": \"Text\"}, {\"bbox_2d\": [321, 616, 910, 684], \"label\": \"Text\"}, {\"bbox_2d\": [321, 694, 910, 791], \"label\": \"Text\"}, {\"bbox_2d\": [321, 816, 918, 910], \"label\": \"Text\"}, {\"bbox_2d\": [92, 988, 102, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [60, 155, 260, 168], \"label\": \"Section-header\"}, {\"bbox_2d\": [321, 803, 704, 814], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 352, 290, 597], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [60, 158, 259, 168], \"label\": \"Text\"}, {\"bbox_2d\": [319, 78, 903, 158], \"label\": \"Text\"}, {\"bbox_2d\": [319, 169, 888, 208], \"label\": \"Text\"}, {\"bbox_2d\": [319, 221, 912, 301], \"label\": \"Text\"}, {\"bbox_2d\": [319, 314, 889, 367], \"label\": \"Text\"}, {\"bbox_2d\": [319, 393, 913, 446], \"label\": \"Text\"}, {\"bbox_2d\": [319, 458, 911, 566], \"label\": \"Text\"}, {\"bbox_2d\": [319, 579, 904, 604], \"label\": \"Text\"}, {\"bbox_2d\": [319, 616, 909, 683], \"label\": \"Text\"}, {\"bbox_2d\": [319, 696, 912, 790], \"label\": \"Text\"}, {\"bbox_2d\": [319, 816, 913, 910], \"label\": \"Text\"}, {\"bbox_2d\": [91, 353, 289, 598], \"label\": \"Text\"}, {\"bbox_2d\": [319, 379, 696, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [319, 802, 703, 814], \"label\": \"Section-header\"}, {\"bbox_2d\": [319, 47, 583, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [93, 990, 101, 998], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [47, 119, 1009, 631], \"label\": \"Table\"}, {\"bbox_2d\": [49, 636, 830, 659], \"label\": \"Text\"}, {\"bbox_2d\": [49, 659, 1009, 677], \"label\": \"Text\"}, {\"bbox_2d\": [49, 637, 1009, 645], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 663, 697, 669], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 677, 276, 690], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 652, 583, 663], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 630, 225, 637], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 609, 577, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 589, 201, 596], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 571, 259, 578], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 553, 225, 560], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 528, 259, 535], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 509, 577, 516], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 492, 225, 499], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 476, 259, 483], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 460, 485, 467], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 443, 495, 450], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 426, 358, 433], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 409, 358, 416], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 394, 577, 401], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 377, 583, 384], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 350, 980, 357], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 324, 977, 332], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 307, 1009, 314], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 291, 977, 298], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 266, 977, 273], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 241, 977, 248], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 224, 977, 231], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 200, 977, 207], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 167, 977, 174], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 142, 977, 149], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 125, 977, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [859, 25, 1009, 37], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [861, 24, 1009, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [49, 640, 325, 648], \"label\": \"Text\"}, {\"bbox_2d\": [49, 654, 277, 662], \"label\": \"Text\"}, {\"bbox_2d\": [47, 117, 976, 633], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 94, 153, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 124, 430, 137], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 139, 510, 168], \"label\": \"Text\"}, {\"bbox_2d\": [91, 171, 504, 181], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 188, 265, 198], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 204, 360, 214], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 221, 191, 231], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 237, 314, 247], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 254, 505, 309], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 325, 647, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 359, 120, 370], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 388, 140, 398], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 475, 142, 485], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 537, 158, 548], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 565, 197, 575], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 616, 192, 627], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 630, 507, 654], \"label\": \"Text\"}, {\"bbox_2d\": [91, 669, 509, 679], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 694, 509, 744], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 753, 314, 764], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 788, 354, 793], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 799, 507, 824], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 832, 507, 857], \"label\": \"Text\"}, {\"bbox_2d\": [91, 861, 260, 871], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 877, 495, 887], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 893, 382, 903], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 616, 944, 640], \"label\": \"Text\"}, {\"bbox_2d\": [529, 793, 939, 832], \"label\": \"Text\"}, {\"bbox_2d\": [529, 644, 944, 698], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 702, 680, 712], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 716, 944, 768], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 773, 735, 783], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 727, 944, 742], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 837, 807, 847], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 841, 807, 851], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 779, 652, 789], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 718, 680, 728], \"label\": \"List-item\"}, {\"bbox_2d\": [529, 608, 944, 618], \"label\": \"Text\"}, {\"bbox_2d\": [529, 475, 913, 485], \"label\": \"Text\"}, {\"bbox_2d\": [529, 499, 811, 509], \"label\": \"Text\"}, {\"bbox_2d\": [529, 523, 944, 533], \"label\": \"Text\"}, {\"bbox_2d\": [529, 547, 944, 557], \"label\": \"Text\"}, {\"bbox_2d\": [529, 561, 904, 571], \"label\": \"Text\"}, {\"bbox_2d\": [529, 575, 904, 585], \"label\": \"Text\"}, {\"bbox_2d\": [529, 360, 913, 380], \"label\": \"Text\"}, {\"bbox_2d\": [529, 127, 944, 291], \"label\": \"Picture\"}, {\"bbox_2d\": [591, 196, 638, 231], \"label\": \"Picture\"}, {\"bbox_2d\": [591, 214, 635, 231], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 388, 591, 398], \"label\": \"Text\"}, {\"bbox_2d\": [255, 402, 813, 412], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 416, 458, 426], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 430, 864, 440], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 444, 730, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 458, 944, 533], \"label\": \"List-item\"}, {\"bbox_2d\": [591, 160, 615, 179], \"label\": \"Picture\"}, {\"bbox_2d\": [591, 196, 616, 214], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 658, 509, 679], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 683, 486, 704], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 708, 507, 744], \"label\": \"List-item\"}, {\"bbox_2d\": [591, 999, 944, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [620, 999, 919, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [591, 127, 688, 138], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [1002, 94, 1013, 309], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 616, 192, 627], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 93, 154, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 124, 429, 135], \"label\": \"Section-header\"}, {\"bbox_2d\": [529, 115, 944, 291], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 140, 507, 167], \"label\": \"Text\"}, {\"bbox_2d\": [91, 171, 502, 182], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 187, 264, 198], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 203, 360, 214], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 218, 190, 230], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 234, 314, 245], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 250, 506, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 326, 648, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 630, 507, 653], \"label\": \"Text\"}, {\"bbox_2d\": [528, 615, 944, 640], \"label\": \"Text\"}, {\"bbox_2d\": [91, 754, 315, 764], \"label\": \"Text\"}, {\"bbox_2d\": [528, 793, 794, 803], \"label\": \"Text\"}, {\"bbox_2d\": [91, 831, 507, 856], \"label\": \"Text\"}, {\"bbox_2d\": [591, 1000, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 656, 507, 679], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 682, 507, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 708, 507, 744], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 768, 427, 778], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 783, 354, 793], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 797, 507, 822], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 860, 260, 870], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 874, 495, 884], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 889, 381, 899], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 808, 938, 818], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 822, 921, 832], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 836, 805, 846], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 644, 944, 697], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 702, 680, 712], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 716, 944, 740], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 745, 944, 769], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 774, 735, 784], \"label\": \"List-item\"}, {\"bbox_2d\": [74, 346, 956, 597], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [131, 73, 308, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [507, 93, 516, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [783, 73, 888, 86], \"label\": \"Page-header\"}, {\"bbox_2d\": [131, 143, 495, 192], \"label\": \"Text\"}, {\"bbox_2d\": [131, 194, 495, 584], \"label\": \"Text\"}, {\"bbox_2d\": [131, 586, 496, 624], \"label\": \"Text\"}, {\"bbox_2d\": [131, 660, 496, 685], \"label\": \"Text\"}, {\"bbox_2d\": [131, 688, 495, 778], \"label\": \"Text\"}, {\"bbox_2d\": [131, 814, 496, 949], \"label\": \"Text\"}, {\"bbox_2d\": [131, 948, 495, 980], \"label\": \"Text\"}, {\"bbox_2d\": [526, 947, 894, 984], \"label\": \"Text\"}, {\"bbox_2d\": [526, 877, 789, 886], \"label\": \"Text\"}, {\"bbox_2d\": [526, 888, 818, 897], \"label\": \"Text\"}, {\"bbox_2d\": [526, 899, 734, 907], \"label\": \"Text\"}, {\"bbox_2d\": [526, 588, 895, 648], \"label\": \"Text\"}, {\"bbox_2d\": [526, 437, 894, 583], \"label\": \"Text\"}, {\"bbox_2d\": [526, 380, 894, 480], \"label\": \"Text\"}, {\"bbox_2d\": [526, 252, 688, 261], \"label\": \"Text\"}, {\"bbox_2d\": [526, 263, 894, 437], \"label\": \"Text\"}, {\"bbox_2d\": [526, 143, 894, 243], \"label\": \"Text\"}, {\"bbox_2d\": [260, 639, 365, 650], \"label\": \"Text\"}, {\"bbox_2d\": [222, 792, 405, 803], \"label\": \"Text\"}, {\"bbox_2d\": [684, 920, 734, 930], \"label\": \"Text\"}, {\"bbox_2d\": [526, 692, 895, 863], \"label\": \"Table\"}, {\"bbox_2d\": [218, 896, 405, 907], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [781, 74, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [506, 93, 520, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 74, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 142, 500, 191], \"label\": \"Text\"}, {\"bbox_2d\": [131, 195, 499, 283], \"label\": \"Text\"}, {\"bbox_2d\": [130, 287, 503, 584], \"label\": \"Text\"}, {\"bbox_2d\": [130, 587, 500, 624], \"label\": \"Text\"}, {\"bbox_2d\": [131, 661, 500, 684], \"label\": \"Text\"}, {\"bbox_2d\": [130, 687, 503, 776], \"label\": \"Text\"}, {\"bbox_2d\": [130, 814, 500, 942], \"label\": \"Text\"}, {\"bbox_2d\": [131, 946, 501, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 142, 895, 245], \"label\": \"Text\"}, {\"bbox_2d\": [527, 250, 691, 260], \"label\": \"Text\"}, {\"bbox_2d\": [526, 266, 895, 432], \"label\": \"Text\"}, {\"bbox_2d\": [526, 438, 898, 527], \"label\": \"Text\"}, {\"bbox_2d\": [526, 532, 896, 581], \"label\": \"Text\"}, {\"bbox_2d\": [526, 587, 895, 649], \"label\": \"Text\"}, {\"bbox_2d\": [526, 946, 894, 982], \"label\": \"Text\"}, {\"bbox_2d\": [684, 920, 736, 930], \"label\": \"Section-header\"}, {\"bbox_2d\": [221, 792, 410, 802], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 875, 788, 883], \"label\": \"Text\"}, {\"bbox_2d\": [526, 886, 730, 894], \"label\": \"Text\"}, {\"bbox_2d\": [261, 640, 369, 650], \"label\": \"Section-header\"}, {\"bbox_2d\": [524, 680, 891, 870], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [142, 104, 903, 932], \"label\": \"Table\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}, {\"bbox_2d\": [141, 104, 904, 932], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [52, 60, 923, 928], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [61, 56, 923, 932], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [98, 54, 284, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [316, 930, 733, 949], \"label\": \"Caption\"}, {\"bbox_2d\": [516, 994, 536, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [256, 126, 793, 913], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [257, 121, 794, 915], \"label\": \"Picture\"}, {\"bbox_2d\": [317, 927, 739, 942], \"label\": \"Caption\"}, {\"bbox_2d\": [98, 54, 295, 85], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [459, 72, 625, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [736, 61, 906, 71], \"label\": \"Page-header\"}, {\"bbox_2d\": [111, 32, 293, 125], \"label\": \"Picture\"}, {\"bbox_2d\": [828, 82, 908, 122], \"label\": \"Picture\"}, {\"bbox_2d\": [217, 132, 433, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 164, 906, 205], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 225, 906, 250], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 269, 906, 295], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 314, 906, 342], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 361, 387, 374], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 394, 906, 435], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 454, 906, 495], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 514, 380, 527], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 546, 906, 601], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 620, 906, 646], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 665, 568, 678], \"label\": \"Section-header\"}, {\"bbox_2d\": [220, 697, 556, 711], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 730, 906, 785], \"label\": \"Text\"}, {\"bbox_2d\": [179, 804, 906, 873], \"label\": \"Text\"}, {\"bbox_2d\": [179, 891, 906, 918], \"label\": \"Text\"}, {\"bbox_2d\": [887, 974, 904, 985], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [94, 29, 303, 127], \"label\": \"Picture\"}, {\"bbox_2d\": [458, 72, 627, 86], \"label\": \"Text\"}, {\"bbox_2d\": [735, 60, 905, 71], \"label\": \"Page-header\"}, {\"bbox_2d\": [826, 78, 911, 126], \"label\": \"Picture\"}, {\"bbox_2d\": [217, 131, 432, 146], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 163, 907, 206], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 223, 907, 251], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 269, 907, 297], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 314, 906, 342], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 360, 392, 375], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 392, 912, 435], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 452, 907, 494], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 512, 383, 527], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 545, 911, 601], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 618, 908, 646], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 664, 571, 679], \"label\": \"Section-header\"}, {\"bbox_2d\": [219, 697, 559, 711], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 728, 906, 785], \"label\": \"Text\"}, {\"bbox_2d\": [179, 802, 909, 872], \"label\": \"Text\"}, {\"bbox_2d\": [179, 890, 906, 918], \"label\": \"Text\"}, {\"bbox_2d\": [886, 973, 900, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [90, 65, 787, 83], \"label\": \"Page-header\"}, {\"bbox_2d\": [90, 609, 947, 651], \"label\": \"Text\"}, {\"bbox_2d\": [90, 563, 302, 579], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 124, 954, 561], \"label\": \"Table\"}, {\"bbox_2d\": [87, 676, 954, 723], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [88, 60, 790, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [86, 122, 954, 722], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [429, 20, 614, 42], \"label\": \"Page-header\"}, {\"bbox_2d\": [98, 208, 384, 231], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 427, 286, 448], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 641, 458, 663], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 256, 588, 298], \"label\": \"Text\"}, {\"bbox_2d\": [98, 321, 946, 387], \"label\": \"Text\"}, {\"bbox_2d\": [98, 474, 904, 516], \"label\": \"Text\"}, {\"bbox_2d\": [98, 540, 850, 582], \"label\": \"Text\"}, {\"bbox_2d\": [98, 584, 888, 605], \"label\": \"Text\"}, {\"bbox_2d\": [98, 687, 754, 730], \"label\": \"Text\"}, {\"bbox_2d\": [98, 732, 874, 753], \"label\": \"Text\"}, {\"bbox_2d\": [98, 776, 769, 839], \"label\": \"Text\"}, {\"bbox_2d\": [98, 854, 915, 963], \"label\": \"Picture\"}, {\"bbox_2d\": [98, 101, 915, 178], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [95, 99, 917, 177], \"label\": \"Picture\"}, {\"bbox_2d\": [93, 853, 919, 964], \"label\": \"Picture\"}, {\"bbox_2d\": [98, 207, 378, 231], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 425, 281, 449], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 639, 451, 663], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 685, 871, 751], \"label\": \"Text\"}, {\"bbox_2d\": [98, 774, 764, 840], \"label\": \"Text\"}, {\"bbox_2d\": [98, 537, 886, 603], \"label\": \"Text\"}, {\"bbox_2d\": [98, 471, 902, 515], \"label\": \"Text\"}, {\"bbox_2d\": [98, 253, 587, 297], \"label\": \"Text\"}, {\"bbox_2d\": [98, 320, 948, 386], \"label\": \"Text\"}, {\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [783, 53, 987, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [129, 99, 544, 191], \"label\": \"Text\"}, {\"bbox_2d\": [129, 195, 544, 303], \"label\": \"Text\"}, {\"bbox_2d\": [578, 101, 988, 342], \"label\": \"Text\"}, {\"bbox_2d\": [578, 346, 988, 551], \"label\": \"Text\"}, {\"bbox_2d\": [129, 367, 544, 589], \"label\": \"Text\"}, {\"bbox_2d\": [129, 593, 544, 897], \"label\": \"Text\"}, {\"bbox_2d\": [129, 901, 545, 978], \"label\": \"Text\"}, {\"bbox_2d\": [578, 782, 988, 978], \"label\": \"Text\"}, {\"bbox_2d\": [541, 989, 575, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [129, 332, 299, 348], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [784, 56, 1000, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 97, 549, 190], \"label\": \"Text\"}, {\"bbox_2d\": [129, 191, 552, 302], \"label\": \"Text\"}, {\"bbox_2d\": [128, 328, 299, 345], \"label\": \"Section-header\"}, {\"bbox_2d\": [127, 363, 547, 587], \"label\": \"Text\"}, {\"bbox_2d\": [127, 588, 548, 718], \"label\": \"Text\"}, {\"bbox_2d\": [127, 719, 546, 894], \"label\": \"Text\"}, {\"bbox_2d\": [128, 897, 546, 974], \"label\": \"Text\"}, {\"bbox_2d\": [542, 984, 575, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [577, 776, 996, 975], \"label\": \"Text\"}, {\"bbox_2d\": [577, 548, 992, 623], \"label\": \"Text\"}, {\"bbox_2d\": [577, 624, 996, 774], \"label\": \"Text\"}, {\"bbox_2d\": [577, 342, 997, 548], \"label\": \"Text\"}, {\"bbox_2d\": [578, 99, 1002, 210], \"label\": \"Text\"}, {\"bbox_2d\": [577, 211, 996, 341], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 918, 167], \"label\": \"Text\"}, {\"bbox_2d\": [231, 185, 917, 228], \"label\": \"Text\"}, {\"bbox_2d\": [231, 246, 885, 272], \"label\": \"Text\"}, {\"bbox_2d\": [231, 488, 926, 563], \"label\": \"Text\"}, {\"bbox_2d\": [691, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 282, 904, 326], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 336, 926, 427], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 574, 926, 600], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 612, 917, 654], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 666, 909, 692], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 704, 914, 746], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 758, 895, 785], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 797, 902, 823], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 834, 919, 876], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 454, 428, 469], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 920, 168], \"label\": \"Text\"}, {\"bbox_2d\": [231, 185, 920, 228], \"label\": \"Text\"}, {\"bbox_2d\": [231, 245, 884, 272], \"label\": \"Text\"}, {\"bbox_2d\": [231, 488, 926, 563], \"label\": \"Text\"}, {\"bbox_2d\": [691, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 832, 921, 876], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 794, 913, 822], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 756, 895, 784], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 703, 915, 746], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 665, 910, 692], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 611, 919, 654], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 573, 926, 601], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 333, 926, 427], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 279, 905, 327], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 454, 428, 470], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 133, 888, 631], \"label\": \"Text\"}, {\"bbox_2d\": [147, 667, 888, 910], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [148, 135, 887, 631], \"label\": \"Text\"}, {\"bbox_2d\": [148, 668, 901, 910], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 98, 601, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 132, 669, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 164, 536, 174], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 181, 914, 225], \"label\": \"Text\"}, {\"bbox_2d\": [121, 343, 914, 404], \"label\": \"Text\"}, {\"bbox_2d\": [121, 426, 125, 433], \"label\": \"Text\"}, {\"bbox_2d\": [152, 246, 507, 256], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 262, 601, 272], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 279, 458, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 295, 764, 305], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 312, 765, 322], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 941, 537, 954], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 605, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 129, 674, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 160, 536, 177], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 177, 919, 225], \"label\": \"Text\"}, {\"bbox_2d\": [121, 340, 918, 404], \"label\": \"Text\"}, {\"bbox_2d\": [121, 420, 129, 436], \"label\": \"Text\"}, {\"bbox_2d\": [152, 307, 769, 324], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 291, 767, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 274, 462, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 257, 606, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 241, 511, 258], \"label\": \"List-item\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [340, 40, 960, 73], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 93, 578, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [154, 125, 877, 182], \"label\": \"Text\"}, {\"bbox_2d\": [154, 198, 877, 255], \"label\": \"Text\"}, {\"bbox_2d\": [154, 271, 877, 355], \"label\": \"Text\"}, {\"bbox_2d\": [109, 369, 898, 412], \"label\": \"Text\"}, {\"bbox_2d\": [109, 428, 922, 608], \"label\": \"Text\"}, {\"bbox_2d\": [109, 623, 923, 789], \"label\": \"Text\"}, {\"bbox_2d\": [109, 804, 914, 895], \"label\": \"Text\"}, {\"bbox_2d\": [109, 909, 914, 969], \"label\": \"Text\"}, {\"bbox_2d\": [781, 987, 974, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 96, 578, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 128, 874, 183], \"label\": \"Text\"}, {\"bbox_2d\": [152, 200, 874, 255], \"label\": \"Text\"}, {\"bbox_2d\": [152, 272, 876, 355], \"label\": \"Text\"}, {\"bbox_2d\": [110, 370, 897, 412], \"label\": \"Text\"}, {\"bbox_2d\": [110, 430, 918, 608], \"label\": \"Text\"}, {\"bbox_2d\": [110, 626, 920, 788], \"label\": \"Text\"}, {\"bbox_2d\": [110, 806, 911, 894], \"label\": \"Text\"}, {\"bbox_2d\": [110, 912, 923, 969], \"label\": \"Text\"}, {\"bbox_2d\": [781, 990, 974, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [340, 43, 958, 74], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [766, 65, 933, 94], \"label\": \"Page-header\"}, {\"bbox_2d\": [165, 312, 660, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [166, 372, 855, 979], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [763, 65, 949, 95], \"label\": \"Page-header\"}, {\"bbox_2d\": [347, 312, 684, 339], \"label\": \"Section-header\"}, {\"bbox_2d\": [161, 359, 862, 980], \"label\": \"Table\"}, {\"bbox_2d\": [507, 988, 515, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 97, 797, 111], \"label\": \"Text\"}, {\"bbox_2d\": [147, 452, 797, 491], \"label\": \"Text\"}, {\"bbox_2d\": [147, 554, 177, 573], \"label\": \"Text\"}, {\"bbox_2d\": [121, 633, 161, 651], \"label\": \"Text\"}, {\"bbox_2d\": [121, 684, 161, 699], \"label\": \"Text\"}, {\"bbox_2d\": [121, 792, 464, 807], \"label\": \"Text\"}, {\"bbox_2d\": [147, 791, 464, 807], \"label\": \"Text\"}, {\"bbox_2d\": [121, 903, 161, 918], \"label\": \"Text\"}, {\"bbox_2d\": [147, 920, 797, 967], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 827, 797, 909], \"label\": \"Formula\"}, {\"bbox_2d\": [191, 742, 797, 783], \"label\": \"Formula\"}, {\"bbox_2d\": [158, 665, 797, 710], \"label\": \"Formula\"}, {\"bbox_2d\": [325, 585, 797, 627], \"label\": \"Formula\"}, {\"bbox_2d\": [300, 477, 797, 519], \"label\": \"Formula\"}, {\"bbox_2d\": [147, 342, 797, 431], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 285, 797, 339], \"label\": \"Formula\"}, {\"bbox_2d\": [279, 255, 797, 284], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 255, 447, 273], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 196, 394, 215], \"label\": \"Formula\"}, {\"bbox_2d\": [252, 163, 797, 218], \"label\": \"Formula\"}, {\"bbox_2d\": [252, 224, 797, 254], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 132, 302, 151], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 319, 365, 339], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 374, 797, 417], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 227, 447, 247], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 257, 447, 277], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 296, 447, 316], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 376, 447, 396], \"label\": \"Formula\"}, {\"bbox_2d\": [279, 344, 797, 374], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 385, 447, 405], \"label\": \"Formula\"}, {\"bbox_2d\": [279, 406, 797, 436], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 449, 447, 469], \"label\": \"Formula\"}, {\"bbox_2d\": [385, 745, 797, 786], \"label\": \"Formula\"}, {\"bbox_2d\": [385, 823, 797, 865], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 804, 464, 818], \"label\": \"Formula\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [147, 96, 799, 110], \"label\": \"Text\"}, {\"bbox_2d\": [121, 448, 911, 490], \"label\": \"Text\"}, {\"bbox_2d\": [121, 714, 190, 728], \"label\": \"Text\"}, {\"bbox_2d\": [147, 793, 463, 807], \"label\": \"Text\"}, {\"bbox_2d\": [121, 903, 157, 917], \"label\": \"Text\"}, {\"bbox_2d\": [205, 376, 910, 444], \"label\": \"Formula\"}, {\"bbox_2d\": [205, 315, 910, 383], \"label\": \"Formula\"}, {\"bbox_2d\": [205, 254, 910, 322], \"label\": \"Formula\"}, {\"bbox_2d\": [205, 193, 910, 261], \"label\": \"Formula\"}, {\"bbox_2d\": [205, 136, 910, 200], \"label\": \"Formula\"}, {\"bbox_2d\": [321, 506, 910, 540], \"label\": \"Formula\"}, {\"bbox_2d\": [351, 585, 910, 619], \"label\": \"Formula\"}, {\"bbox_2d\": [192, 661, 910, 708], \"label\": \"Formula\"}, {\"bbox_2d\": [306, 744, 910, 778], \"label\": \"Formula\"}, {\"bbox_2d\": [174, 828, 910, 880], \"label\": \"Formula\"}, {\"bbox_2d\": [208, 929, 910, 976], \"label\": \"Formula\"}, {\"bbox_2d\": [498, 989, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 635, 157, 649], \"label\": \"Text\"}, {\"bbox_2d\": [121, 555, 151, 570], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [907, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 336, 951, 391], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 738, 951, 778], \"label\": \"Caption\"}, {\"bbox_2d\": [133, 68, 895, 319], \"label\": \"Picture\"}, {\"bbox_2d\": [132, 452, 894, 723], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [135, 68, 905, 320], \"label\": \"Picture\"}, {\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 336, 951, 389], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 738, 951, 777], \"label\": \"Caption\"}, {\"bbox_2d\": [137, 452, 910, 719], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [72, 149, 303, 294], \"label\": \"Text\"}, {\"bbox_2d\": [72, 313, 268, 355], \"label\": \"Text\"}, {\"bbox_2d\": [72, 359, 293, 444], \"label\": \"Text\"}, {\"bbox_2d\": [72, 464, 271, 506], \"label\": \"Text\"}, {\"bbox_2d\": [72, 510, 288, 595], \"label\": \"Text\"}, {\"bbox_2d\": [72, 615, 268, 657], \"label\": \"Text\"}, {\"bbox_2d\": [72, 661, 306, 746], \"label\": \"Text\"}, {\"bbox_2d\": [72, 789, 270, 808], \"label\": \"Text\"}, {\"bbox_2d\": [72, 811, 303, 897], \"label\": \"Text\"}, {\"bbox_2d\": [369, 811, 594, 940], \"label\": \"Text\"}, {\"bbox_2d\": [369, 708, 615, 789], \"label\": \"Text\"}, {\"bbox_2d\": [369, 556, 593, 640], \"label\": \"Text\"}, {\"bbox_2d\": [369, 508, 593, 550], \"label\": \"Text\"}, {\"bbox_2d\": [369, 313, 565, 355], \"label\": \"Text\"}, {\"bbox_2d\": [369, 149, 593, 294], \"label\": \"Text\"}, {\"bbox_2d\": [665, 149, 854, 176], \"label\": \"Section-header\"}, {\"bbox_2d\": [665, 269, 876, 372], \"label\": \"Text\"}, {\"bbox_2d\": [665, 391, 919, 419], \"label\": \"Section-header\"}, {\"bbox_2d\": [665, 510, 873, 538], \"label\": \"Section-header\"}, {\"bbox_2d\": [665, 661, 842, 693], \"label\": \"Section-header\"}, {\"bbox_2d\": [665, 542, 888, 644], \"label\": \"Text\"}, {\"bbox_2d\": [665, 695, 892, 764], \"label\": \"Text\"}, {\"bbox_2d\": [665, 786, 881, 812], \"label\": \"Section-header\"}, {\"bbox_2d\": [665, 816, 888, 885], \"label\": \"Text\"}, {\"bbox_2d\": [72, 999, 169, 1013], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [43, 999, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [72, 149, 268, 190], \"label\": \"Section-header\"}, {\"bbox_2d\": [369, 149, 565, 205], \"label\": \"Section-header\"}, {\"bbox_2d\": [666, 149, 855, 176], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 315, 268, 356], \"label\": \"Section-header\"}, {\"bbox_2d\": [369, 314, 565, 355], \"label\": \"Section-header\"}, {\"bbox_2d\": [666, 270, 873, 297], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 465, 268, 507], \"label\": \"Section-header\"}, {\"bbox_2d\": [369, 510, 565, 551], \"label\": \"Section-header\"}, {\"bbox_2d\": [666, 513, 875, 540], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 616, 268, 657], \"label\": \"Section-header\"}, {\"bbox_2d\": [369, 660, 565, 701], \"label\": \"Section-header\"}, {\"bbox_2d\": [666, 664, 844, 691], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 766, 268, 808], \"label\": \"Section-header\"}, {\"bbox_2d\": [369, 811, 569, 852], \"label\": \"Section-header\"}, {\"bbox_2d\": [666, 785, 877, 812], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 812, 301, 896], \"label\": \"Text\"}, {\"bbox_2d\": [369, 856, 596, 941], \"label\": \"Text\"}, {\"bbox_2d\": [666, 816, 884, 886], \"label\": \"Text\"}, {\"bbox_2d\": [369, 706, 619, 790], \"label\": \"Text\"}, {\"bbox_2d\": [666, 695, 885, 764], \"label\": \"Text\"}, {\"bbox_2d\": [72, 661, 308, 746], \"label\": \"Text\"}, {\"bbox_2d\": [666, 544, 884, 643], \"label\": \"Text\"}, {\"bbox_2d\": [369, 555, 591, 639], \"label\": \"Text\"}, {\"bbox_2d\": [72, 511, 289, 595], \"label\": \"Text\"}, {\"bbox_2d\": [666, 423, 881, 492], \"label\": \"Text\"}, {\"bbox_2d\": [369, 360, 610, 489], \"label\": \"Text\"}, {\"bbox_2d\": [72, 360, 293, 445], \"label\": \"Text\"}, {\"bbox_2d\": [666, 302, 893, 371], \"label\": \"Text\"}, {\"bbox_2d\": [666, 391, 919, 418], \"label\": \"Section-header\"}, {\"bbox_2d\": [666, 180, 868, 250], \"label\": \"Text\"}, {\"bbox_2d\": [369, 209, 591, 293], \"label\": \"Text\"}, {\"bbox_2d\": [72, 195, 304, 294], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 85, 675, 98], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 316, 926, 410], \"label\": \"Text\"}, {\"bbox_2d\": [709, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [292, 96, 933, 289], \"label\": \"Table\"}, {\"bbox_2d\": [292, 420, 934, 571], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [293, 86, 677, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 99, 932, 288], \"label\": \"Table\"}, {\"bbox_2d\": [292, 420, 932, 569], \"label\": \"Table\"}, {\"bbox_2d\": [293, 318, 926, 410], \"label\": \"List-item\"}, {\"bbox_2d\": [709, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [59, 35, 101, 73], \"label\": \"Page-header\"}, {\"bbox_2d\": [59, 67, 240, 79], \"label\": \"Text\"}, {\"bbox_2d\": [80, 192, 333, 205], \"label\": \"Section-header\"}, {\"bbox_2d\": [80, 144, 373, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 210, 350, 320], \"label\": \"Text\"}, {\"bbox_2d\": [79, 327, 351, 461], \"label\": \"Text\"}, {\"bbox_2d\": [380, 210, 646, 318], \"label\": \"Text\"}, {\"bbox_2d\": [380, 327, 635, 387], \"label\": \"Text\"}, {\"bbox_2d\": [683, 361, 953, 435], \"label\": \"Text\"}, {\"bbox_2d\": [380, 392, 636, 476], \"label\": \"List-item\"}, {\"bbox_2d\": [683, 210, 941, 232], \"label\": \"List-item\"}, {\"bbox_2d\": [683, 237, 936, 260], \"label\": \"List-item\"}, {\"bbox_2d\": [683, 265, 934, 351], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 533, 200, 550], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 563, 959, 709], \"label\": \"Table\"}, {\"bbox_2d\": [79, 982, 192, 995], \"label\": \"Page-footer\"}, {\"bbox_2d\": [22, 975, 61, 1001], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [57, 34, 240, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 143, 373, 162], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 984, 193, 995], \"label\": \"Page-footer\"}, {\"bbox_2d\": [13, 975, 66, 1009], \"label\": \"Picture\"}, {\"bbox_2d\": [77, 524, 959, 707], \"label\": \"Table\"}, {\"bbox_2d\": [78, 189, 333, 205], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 209, 350, 319], \"label\": \"Text\"}, {\"bbox_2d\": [78, 327, 354, 461], \"label\": \"Text\"}, {\"bbox_2d\": [381, 209, 648, 319], \"label\": \"Text\"}, {\"bbox_2d\": [381, 327, 654, 387], \"label\": \"Text\"}, {\"bbox_2d\": [381, 391, 645, 477], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 209, 942, 233], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 237, 939, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 265, 952, 350], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 362, 955, 435], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [438, 74, 587, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [694, 93, 727, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 326, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 113, 351, 123], \"label\": \"Text\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [127, 127, 430, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [574, 127, 902, 969], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [438, 73, 587, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [299, 93, 331, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [694, 93, 727, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [282, 113, 351, 123], \"label\": \"Text\"}, {\"bbox_2d\": [539, 126, 895, 308], \"label\": \"Picture\"}, {\"bbox_2d\": [170, 128, 502, 514], \"label\": \"Picture\"}, {\"bbox_2d\": [585, 313, 893, 479], \"label\": \"Picture\"}, {\"bbox_2d\": [525, 485, 895, 662], \"label\": \"Picture\"}, {\"bbox_2d\": [578, 665, 894, 831], \"label\": \"Picture\"}, {\"bbox_2d\": [569, 836, 896, 967], \"label\": \"Picture\"}, {\"bbox_2d\": [126, 810, 499, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [145, 577, 501, 779], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [135, 110, 551, 226], \"label\": \"Text\"}, {\"bbox_2d\": [135, 226, 551, 751], \"label\": \"Text\"}, {\"bbox_2d\": [135, 751, 555, 986], \"label\": \"Text\"}, {\"bbox_2d\": [588, 110, 1001, 303], \"label\": \"Text\"}, {\"bbox_2d\": [588, 301, 1001, 695], \"label\": \"Text\"}, {\"bbox_2d\": [588, 693, 1006, 985], \"label\": \"Text\"}, {\"bbox_2d\": [794, 68, 996, 101], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [793, 69, 1016, 100], \"label\": \"Page-header\"}, {\"bbox_2d\": [587, 112, 1007, 299], \"label\": \"Text\"}, {\"bbox_2d\": [135, 113, 559, 224], \"label\": \"Text\"}, {\"bbox_2d\": [135, 225, 562, 749], \"label\": \"Text\"}, {\"bbox_2d\": [588, 299, 1015, 693], \"label\": \"Text\"}, {\"bbox_2d\": [592, 693, 1018, 983], \"label\": \"Text\"}, {\"bbox_2d\": [138, 750, 568, 985], \"label\": \"Text\"}, {\"bbox_2d\": [561, 993, 594, 1008], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 926, 168], \"label\": \"Text\"}, {\"bbox_2d\": [231, 184, 719, 197], \"label\": \"Text\"}, {\"bbox_2d\": [241, 222, 923, 312], \"label\": \"Text\"}, {\"bbox_2d\": [231, 380, 904, 422], \"label\": \"Text\"}, {\"bbox_2d\": [231, 440, 926, 499], \"label\": \"Text\"}, {\"bbox_2d\": [231, 561, 838, 588], \"label\": \"Text\"}, {\"bbox_2d\": [231, 630, 926, 672], \"label\": \"Text\"}, {\"bbox_2d\": [231, 690, 700, 702], \"label\": \"Text\"}, {\"bbox_2d\": [231, 783, 919, 810], \"label\": \"Text\"}, {\"bbox_2d\": [231, 886, 888, 913], \"label\": \"Text\"}, {\"bbox_2d\": [231, 925, 578, 936], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 941, 633, 952], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 821, 829, 833], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 842, 909, 869], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 713, 701, 725], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 729, 730, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 761, 517, 775], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 525, 487, 541], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 609, 563, 623], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 346, 562, 361], \"label\": \"Section-header\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 926, 168], \"label\": \"Text\"}, {\"bbox_2d\": [231, 185, 718, 197], \"label\": \"Text\"}, {\"bbox_2d\": [241, 221, 916, 312], \"label\": \"Text\"}, {\"bbox_2d\": [231, 380, 903, 423], \"label\": \"Text\"}, {\"bbox_2d\": [231, 440, 936, 499], \"label\": \"Text\"}, {\"bbox_2d\": [231, 560, 842, 588], \"label\": \"Text\"}, {\"bbox_2d\": [231, 629, 926, 673], \"label\": \"Text\"}, {\"bbox_2d\": [231, 690, 700, 702], \"label\": \"Text\"}, {\"bbox_2d\": [231, 781, 919, 809], \"label\": \"Text\"}, {\"bbox_2d\": [231, 886, 889, 913], \"label\": \"Text\"}, {\"bbox_2d\": [231, 939, 631, 951], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 924, 573, 936], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 841, 909, 869], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 819, 824, 831], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 712, 699, 724], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 728, 729, 740], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 345, 562, 361], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 526, 487, 541], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 610, 563, 624], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 761, 516, 776], \"label\": \"Section-header\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 26, 351, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 92, 222, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 336, 379, 352], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 582, 201, 596], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 821, 428, 832], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 107, 506, 179], \"label\": \"Text\"}, {\"bbox_2d\": [528, 92, 944, 119], \"label\": \"Text\"}, {\"bbox_2d\": [528, 143, 944, 243], \"label\": \"Text\"}, {\"bbox_2d\": [528, 253, 944, 294], \"label\": \"Text\"}, {\"bbox_2d\": [91, 187, 506, 259], \"label\": \"Text\"}, {\"bbox_2d\": [91, 267, 506, 307], \"label\": \"Text\"}, {\"bbox_2d\": [91, 354, 713, 367], \"label\": \"Text\"}, {\"bbox_2d\": [528, 526, 944, 583], \"label\": \"Text\"}, {\"bbox_2d\": [91, 526, 506, 568], \"label\": \"Text\"}, {\"bbox_2d\": [91, 597, 506, 696], \"label\": \"Text\"}, {\"bbox_2d\": [91, 705, 482, 718], \"label\": \"Text\"}, {\"bbox_2d\": [91, 718, 506, 746], \"label\": \"Text\"}, {\"bbox_2d\": [91, 746, 506, 775], \"label\": \"Text\"}, {\"bbox_2d\": [91, 775, 506, 804], \"label\": \"Text\"}, {\"bbox_2d\": [528, 769, 917, 779], \"label\": \"Text\"}, {\"bbox_2d\": [528, 596, 926, 609], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 378, 958, 512], \"label\": \"Table\"}, {\"bbox_2d\": [528, 613, 952, 760], \"label\": \"Table\"}, {\"bbox_2d\": [87, 842, 513, 930], \"label\": \"Table\"}, {\"bbox_2d\": [552, 842, 947, 931], \"label\": \"Table\"}, {\"bbox_2d\": [91, 939, 482, 950], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 951, 478, 962], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 999, 445, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [89, 842, 233, 856], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 351, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 223, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 95, 944, 119], \"label\": \"Text\"}, {\"bbox_2d\": [91, 109, 507, 177], \"label\": \"Text\"}, {\"bbox_2d\": [91, 189, 507, 257], \"label\": \"Text\"}, {\"bbox_2d\": [91, 269, 507, 308], \"label\": \"Text\"}, {\"bbox_2d\": [528, 255, 944, 293], \"label\": \"Text\"}, {\"bbox_2d\": [528, 146, 944, 242], \"label\": \"Text\"}, {\"bbox_2d\": [528, 132, 672, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 339, 380, 353], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 357, 715, 367], \"label\": \"Text\"}, {\"bbox_2d\": [91, 528, 508, 567], \"label\": \"Text\"}, {\"bbox_2d\": [528, 528, 944, 581], \"label\": \"Text\"}, {\"bbox_2d\": [78, 377, 960, 514], \"label\": \"Table\"}, {\"bbox_2d\": [91, 584, 201, 594], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 599, 925, 609], \"label\": \"Text\"}, {\"bbox_2d\": [91, 599, 507, 695], \"label\": \"Text\"}, {\"bbox_2d\": [91, 707, 488, 717], \"label\": \"Text\"}, {\"bbox_2d\": [91, 722, 507, 746], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 751, 507, 775], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 779, 507, 804], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 770, 919, 778], \"label\": \"Text\"}, {\"bbox_2d\": [91, 822, 428, 832], \"label\": \"Section-header\"}, {\"bbox_2d\": [525, 621, 948, 762], \"label\": \"Table\"}, {\"bbox_2d\": [89, 846, 946, 935], \"label\": \"Table\"}, {\"bbox_2d\": [91, 941, 482, 950], \"label\": \"Text\"}, {\"bbox_2d\": [91, 953, 479, 961], \"label\": \"Text\"}, {\"bbox_2d\": [91, 1000, 444, 1008], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [698, 93, 722, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [440, 73, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 326, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [142, 113, 480, 252], \"label\": \"Picture\"}, {\"bbox_2d\": [192, 307, 434, 441], \"label\": \"Picture\"}, {\"bbox_2d\": [204, 499, 421, 733], \"label\": \"Picture\"}, {\"bbox_2d\": [156, 731, 470, 976], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 398, 870, 675], \"label\": \"Picture\"}, {\"bbox_2d\": [566, 714, 853, 969], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [438, 73, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [698, 92, 721, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 94, 326, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [142, 110, 492, 977], \"label\": \"Picture\"}, {\"bbox_2d\": [546, 110, 872, 968], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [72, 101, 409, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 120, 918, 898], \"label\": \"Table\"}, {\"bbox_2d\": [43, 997, 170, 1016], \"label\": \"Page-footer\"}, {\"bbox_2d\": [39, 14, 276, 29], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [73, 102, 410, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [37, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 20, 275, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [70, 119, 919, 898], \"label\": \"Table\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [68, 47, 602, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [68, 68, 691, 84], \"label\": \"Text\"}, {\"bbox_2d\": [62, 102, 525, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 131, 905, 175], \"label\": \"Text\"}, {\"bbox_2d\": [94, 188, 912, 232], \"label\": \"Text\"}, {\"bbox_2d\": [94, 245, 912, 274], \"label\": \"Text\"}, {\"bbox_2d\": [94, 308, 900, 351], \"label\": \"Text\"}, {\"bbox_2d\": [94, 364, 910, 422], \"label\": \"Text\"}, {\"bbox_2d\": [94, 435, 907, 493], \"label\": \"Text\"}, {\"bbox_2d\": [94, 527, 910, 626], \"label\": \"Text\"}, {\"bbox_2d\": [94, 660, 901, 733], \"label\": \"Text\"}, {\"bbox_2d\": [94, 640, 196, 654], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 504, 176, 519], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 987, 282, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [69, 45, 601, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [61, 102, 524, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 287, 212, 300], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 505, 178, 519], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 639, 197, 653], \"label\": \"Section-header\"}, {\"bbox_2d\": [69, 70, 690, 85], \"label\": \"Text\"}, {\"bbox_2d\": [94, 132, 915, 174], \"label\": \"Text\"}, {\"bbox_2d\": [94, 188, 921, 230], \"label\": \"Text\"}, {\"bbox_2d\": [94, 245, 922, 273], \"label\": \"Text\"}, {\"bbox_2d\": [94, 308, 909, 350], \"label\": \"Text\"}, {\"bbox_2d\": [94, 365, 921, 421], \"label\": \"Text\"}, {\"bbox_2d\": [94, 435, 917, 491], \"label\": \"Text\"}, {\"bbox_2d\": [94, 527, 914, 625], \"label\": \"Text\"}, {\"bbox_2d\": [94, 661, 911, 731], \"label\": \"Text\"}, {\"bbox_2d\": [60, 987, 282, 997], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [294, 84, 875, 114], \"label\": \"Text\"}, {\"bbox_2d\": [294, 130, 531, 144], \"label\": \"Text\"}, {\"bbox_2d\": [294, 476, 885, 503], \"label\": \"Text\"}, {\"bbox_2d\": [294, 546, 867, 574], \"label\": \"Text\"}, {\"bbox_2d\": [294, 592, 529, 606], \"label\": \"Text\"}, {\"bbox_2d\": [294, 521, 485, 537], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 150, 931, 447], \"label\": \"Picture\"}, {\"bbox_2d\": [294, 612, 931, 907], \"label\": \"Picture\"}, {\"bbox_2d\": [583, 979, 933, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 875, 114], \"label\": \"Text\"}, {\"bbox_2d\": [293, 132, 529, 145], \"label\": \"Text\"}, {\"bbox_2d\": [293, 475, 886, 504], \"label\": \"Text\"}, {\"bbox_2d\": [293, 546, 867, 575], \"label\": \"Text\"}, {\"bbox_2d\": [293, 593, 529, 606], \"label\": \"Text\"}, {\"bbox_2d\": [583, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [317, 617, 844, 899], \"label\": \"Text\"}, {\"bbox_2d\": [317, 156, 844, 426], \"label\": \"Text\"}, {\"bbox_2d\": [217, 522, 485, 539], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [114, 69, 203, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [738, 69, 931, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [118, 102, 907, 370], \"label\": \"Text\"}, {\"bbox_2d\": [119, 381, 907, 481], \"label\": \"Text\"}, {\"bbox_2d\": [169, 494, 628, 509], \"label\": \"Text\"}, {\"bbox_2d\": [121, 520, 907, 565], \"label\": \"Text\"}, {\"bbox_2d\": [169, 603, 264, 620], \"label\": \"Text\"}, {\"bbox_2d\": [121, 632, 908, 954], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [513, 981, 529, 989], \"label\": \"Page-footer\"}, {\"bbox_2d\": [110, 69, 203, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [740, 69, 931, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [119, 101, 907, 370], \"label\": \"Text\"}, {\"bbox_2d\": [122, 380, 907, 480], \"label\": \"Text\"}, {\"bbox_2d\": [170, 491, 625, 509], \"label\": \"Text\"}, {\"bbox_2d\": [122, 519, 907, 564], \"label\": \"Text\"}, {\"bbox_2d\": [170, 604, 262, 620], \"label\": \"Text\"}, {\"bbox_2d\": [123, 630, 911, 955], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [33, 74, 911, 152], \"label\": \"Text\"}, {\"bbox_2d\": [121, 183, 726, 201], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 308, 686, 326], \"label\": \"Formula\"}, {\"bbox_2d\": [347, 311, 686, 326], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 232, 911, 279], \"label\": \"Text\"}, {\"bbox_2d\": [121, 357, 911, 464], \"label\": \"Text\"}, {\"bbox_2d\": [121, 476, 911, 655], \"label\": \"Text\"}, {\"bbox_2d\": [121, 689, 636, 717], \"label\": \"Formula\"}, {\"bbox_2d\": [401, 672, 636, 718], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 714, 477, 728], \"label\": \"Text\"}, {\"bbox_2d\": [121, 792, 636, 829], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 841, 911, 888], \"label\": \"Text\"}, {\"bbox_2d\": [440, 48, 596, 60], \"label\": \"Page-header\"}, {\"bbox_2d\": [38, 74, 48, 86], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [121, 76, 914, 153], \"label\": \"Text\"}, {\"bbox_2d\": [121, 184, 725, 211], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 232, 914, 286], \"label\": \"Text\"}, {\"bbox_2d\": [121, 310, 688, 336], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 354, 914, 464], \"label\": \"Text\"}, {\"bbox_2d\": [121, 478, 914, 655], \"label\": \"Text\"}, {\"bbox_2d\": [121, 674, 639, 734], \"label\": \"Formula\"}, {\"bbox_2d\": [142, 740, 476, 764], \"label\": \"Text\"}, {\"bbox_2d\": [121, 776, 639, 835], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 842, 914, 900], \"label\": \"Text\"}, {\"bbox_2d\": [121, 48, 129, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [436, 48, 599, 57], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [694, 93, 727, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [299, 93, 333, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [159, 112, 479, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [552, 112, 746, 345], \"label\": \"Picture\"}, {\"bbox_2d\": [677, 112, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [541, 382, 584, 392], \"label\": \"Text\"}, {\"bbox_2d\": [526, 382, 892, 408], \"label\": \"Text\"}, {\"bbox_2d\": [526, 413, 895, 447], \"label\": \"Text\"}, {\"bbox_2d\": [526, 453, 896, 488], \"label\": \"Text\"}, {\"bbox_2d\": [526, 493, 896, 588], \"label\": \"Text\"}, {\"bbox_2d\": [526, 593, 896, 653], \"label\": \"Text\"}, {\"bbox_2d\": [526, 658, 896, 676], \"label\": \"Text\"}, {\"bbox_2d\": [526, 682, 895, 762], \"label\": \"Text\"}, {\"bbox_2d\": [526, 767, 896, 789], \"label\": \"Text\"}, {\"bbox_2d\": [526, 793, 896, 803], \"label\": \"Text\"}, {\"bbox_2d\": [502, 806, 896, 967], \"label\": \"Picture\"}, {\"bbox_2d\": [386, 498, 497, 508], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [440, 74, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [299, 93, 333, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [694, 93, 728, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 113, 351, 123], \"label\": \"Text\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [781, 792, 893, 800], \"label\": \"Text\"}, {\"bbox_2d\": [541, 370, 586, 380], \"label\": \"Text\"}, {\"bbox_2d\": [526, 380, 895, 407], \"label\": \"Text\"}, {\"bbox_2d\": [524, 411, 895, 448], \"label\": \"Text\"}, {\"bbox_2d\": [526, 452, 895, 489], \"label\": \"Text\"}, {\"bbox_2d\": [526, 493, 895, 585], \"label\": \"Text\"}, {\"bbox_2d\": [526, 589, 895, 654], \"label\": \"Text\"}, {\"bbox_2d\": [524, 657, 896, 762], \"label\": \"Text\"}, {\"bbox_2d\": [556, 124, 740, 344], \"label\": \"Picture\"}, {\"bbox_2d\": [131, 124, 495, 496], \"label\": \"Picture\"}, {\"bbox_2d\": [140, 506, 484, 967], \"label\": \"Picture\"}, {\"bbox_2d\": [386, 498, 497, 506], \"label\": \"Text\"}, {\"bbox_2d\": [526, 809, 889, 967], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [131, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [507, 93, 522, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [772, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [131, 142, 501, 321], \"label\": \"Text\"}, {\"bbox_2d\": [131, 326, 501, 447], \"label\": \"Text\"}, {\"bbox_2d\": [131, 444, 501, 572], \"label\": \"Text\"}, {\"bbox_2d\": [131, 577, 501, 697], \"label\": \"Text\"}, {\"bbox_2d\": [131, 703, 501, 838], \"label\": \"Text\"}, {\"bbox_2d\": [131, 844, 501, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 142, 895, 245], \"label\": \"Text\"}, {\"bbox_2d\": [526, 249, 894, 359], \"label\": \"Text\"}, {\"bbox_2d\": [526, 363, 895, 508], \"label\": \"Text\"}, {\"bbox_2d\": [526, 513, 895, 663], \"label\": \"Text\"}, {\"bbox_2d\": [526, 668, 895, 825], \"label\": \"Text\"}, {\"bbox_2d\": [526, 830, 895, 982], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [506, 93, 522, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [771, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [526, 855, 898, 982], \"label\": \"Text\"}, {\"bbox_2d\": [130, 867, 506, 982], \"label\": \"Text\"}, {\"bbox_2d\": [130, 762, 500, 863], \"label\": \"Text\"}, {\"bbox_2d\": [526, 666, 895, 845], \"label\": \"Text\"}, {\"bbox_2d\": [130, 682, 506, 758], \"label\": \"Text\"}, {\"bbox_2d\": [130, 576, 500, 678], \"label\": \"Text\"}, {\"bbox_2d\": [526, 516, 896, 657], \"label\": \"Text\"}, {\"bbox_2d\": [526, 367, 896, 508], \"label\": \"Text\"}, {\"bbox_2d\": [526, 270, 895, 358], \"label\": \"Text\"}, {\"bbox_2d\": [527, 251, 680, 262], \"label\": \"Text\"}, {\"bbox_2d\": [526, 142, 895, 243], \"label\": \"Text\"}, {\"bbox_2d\": [130, 445, 500, 572], \"label\": \"Text\"}, {\"bbox_2d\": [131, 326, 503, 440], \"label\": \"Text\"}, {\"bbox_2d\": [130, 155, 500, 322], \"label\": \"Text\"}, {\"bbox_2d\": [132, 142, 293, 152], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [422, 22, 893, 66], \"label\": \"Picture\"}, {\"bbox_2d\": [591, 51, 736, 64], \"label\": \"Text\"}, {\"bbox_2d\": [130, 72, 546, 110], \"label\": \"Title\"}, {\"bbox_2d\": [566, 97, 890, 110], \"label\": \"Title\"}, {\"bbox_2d\": [566, 114, 890, 129], \"label\": \"Title\"}, {\"bbox_2d\": [130, 157, 497, 207], \"label\": \"Title\"}, {\"bbox_2d\": [526, 157, 842, 167], \"label\": \"Section-header\"}, {\"bbox_2d\": [628, 230, 792, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 222, 480, 230], \"label\": \"List-item\"}, {\"bbox_2d\": [130, 248, 492, 387], \"label\": \"List-item\"}, {\"bbox_2d\": [130, 406, 480, 417], \"label\": \"List-item\"}, {\"bbox_2d\": [130, 436, 311, 447], \"label\": \"List-item\"}, {\"bbox_2d\": [130, 467, 330, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [130, 519, 499, 541], \"label\": \"List-item\"}, {\"bbox_2d\": [214, 500, 412, 509], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 353, 751, 363], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 369, 894, 434], \"label\": \"Text\"}, {\"bbox_2d\": [542, 436, 894, 460], \"label\": \"Text\"}, {\"bbox_2d\": [526, 506, 894, 541], \"label\": \"Text\"}, {\"bbox_2d\": [562, 497, 894, 504], \"label\": \"Text\"}, {\"bbox_2d\": [562, 471, 894, 489], \"label\": \"Text\"}, {\"bbox_2d\": [562, 495, 894, 502], \"label\": \"Text\"}, {\"bbox_2d\": [562, 468, 894, 485], \"label\": \"Text\"}, {\"bbox_2d\": [526, 183, 894, 206], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 206, 894, 221], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 221, 894, 228], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 251, 770, 298], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 298, 894, 345], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 353, 751, 363], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 512, 894, 541], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [421, 20, 892, 50], \"label\": \"Picture\"}, {\"bbox_2d\": [593, 52, 736, 63], \"label\": \"Text\"}, {\"bbox_2d\": [131, 69, 552, 112], \"label\": \"Title\"}, {\"bbox_2d\": [163, 115, 340, 129], \"label\": \"Title\"}, {\"bbox_2d\": [562, 97, 898, 110], \"label\": \"Title\"}, {\"bbox_2d\": [562, 114, 895, 130], \"label\": \"Title\"}, {\"bbox_2d\": [131, 157, 498, 207], \"label\": \"Title\"}, {\"bbox_2d\": [131, 221, 481, 231], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 246, 497, 387], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 407, 481, 417], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 437, 315, 447], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 468, 332, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 518, 500, 542], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 506, 895, 542], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 422, 894, 459], \"label\": \"List-item\"}, {\"bbox_2d\": [561, 468, 892, 497], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 304, 894, 340], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 250, 617, 260], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 157, 844, 167], \"label\": \"Section-header\"}, {\"bbox_2d\": [625, 231, 795, 241], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 353, 754, 363], \"label\": \"Section-header\"}, {\"bbox_2d\": [214, 499, 416, 509], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 370, 895, 419], \"label\": \"Text\"}, {\"bbox_2d\": [567, 262, 773, 300], \"label\": \"Table\"}, {\"bbox_2d\": [538, 181, 895, 220], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [78, 41, 910, 73], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 122, 933, 255], \"label\": \"Text\"}, {\"bbox_2d\": [125, 272, 933, 376], \"label\": \"Text\"}, {\"bbox_2d\": [125, 393, 935, 542], \"label\": \"Text\"}, {\"bbox_2d\": [125, 559, 925, 708], \"label\": \"Text\"}, {\"bbox_2d\": [122, 760, 931, 894], \"label\": \"Text\"}, {\"bbox_2d\": [122, 912, 893, 941], \"label\": \"Text\"}, {\"bbox_2d\": [122, 731, 299, 744], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 990, 439, 1001], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [77, 43, 907, 74], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 123, 938, 256], \"label\": \"Text\"}, {\"bbox_2d\": [125, 274, 934, 376], \"label\": \"Text\"}, {\"bbox_2d\": [125, 394, 935, 541], \"label\": \"Text\"}, {\"bbox_2d\": [123, 761, 930, 894], \"label\": \"Text\"}, {\"bbox_2d\": [122, 915, 893, 942], \"label\": \"Text\"}, {\"bbox_2d\": [123, 732, 299, 744], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 560, 924, 707], \"label\": \"Text\"}, {\"bbox_2d\": [62, 990, 438, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 987, 684, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [109, 498, 465, 517], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 93, 866, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 131, 658, 144], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 154, 679, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 176, 880, 203], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 213, 923, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 283, 923, 343], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 353, 864, 366], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 375, 688, 388], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 397, 750, 410], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 419, 868, 432], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 594, 527, 606], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 724, 595, 736], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 814, 399, 826], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 539, 907, 584], \"label\": \"Text\"}, {\"bbox_2d\": [231, 447, 821, 459], \"label\": \"Text\"}, {\"bbox_2d\": [255, 617, 915, 675], \"label\": \"Text\"}, {\"bbox_2d\": [255, 685, 923, 712], \"label\": \"Text\"}, {\"bbox_2d\": [255, 744, 923, 803], \"label\": \"Text\"}, {\"bbox_2d\": [255, 837, 923, 910], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 871, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 131, 659, 143], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 153, 679, 165], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 175, 884, 203], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 213, 926, 272], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 283, 926, 342], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 352, 865, 364], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 374, 689, 386], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 397, 749, 409], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 419, 869, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 593, 527, 605], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 722, 595, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 814, 399, 826], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 987, 685, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [255, 836, 926, 911], \"label\": \"Text\"}, {\"bbox_2d\": [255, 744, 926, 804], \"label\": \"Text\"}, {\"bbox_2d\": [255, 684, 926, 712], \"label\": \"Text\"}, {\"bbox_2d\": [255, 615, 919, 674], \"label\": \"Text\"}, {\"bbox_2d\": [231, 539, 913, 583], \"label\": \"Text\"}, {\"bbox_2d\": [231, 448, 822, 460], \"label\": \"Text\"}, {\"bbox_2d\": [109, 497, 466, 516], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [409, 122, 749, 136], \"label\": \"Page-header\"}, {\"bbox_2d\": [187, 122, 334, 136], \"label\": \"Page-header\"}, {\"bbox_2d\": [309, 258, 495, 290], \"label\": \"Title\"}, {\"bbox_2d\": [546, 294, 616, 309], \"label\": \"Title\"}, {\"bbox_2d\": [332, 319, 828, 341], \"label\": \"Text\"}, {\"bbox_2d\": [310, 352, 854, 377], \"label\": \"Text\"}, {\"bbox_2d\": [309, 385, 505, 396], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 402, 854, 429], \"label\": \"Text\"}, {\"bbox_2d\": [309, 435, 854, 460], \"label\": \"Section-header\"}, {\"bbox_2d\": [310, 466, 854, 490], \"label\": \"Text\"}, {\"bbox_2d\": [310, 496, 796, 507], \"label\": \"Text\"}, {\"bbox_2d\": [343, 507, 854, 529], \"label\": \"Text\"}, {\"bbox_2d\": [327, 547, 545, 558], \"label\": \"Text\"}, {\"bbox_2d\": [310, 817, 752, 875], \"label\": \"Text\"}, {\"bbox_2d\": [377, 492, 796, 503], \"label\": \"Text\"}, {\"bbox_2d\": [188, 317, 287, 343], \"label\": \"Text\"}, {\"bbox_2d\": [188, 367, 287, 441], \"label\": \"Text\"}, {\"bbox_2d\": [311, 813, 609, 827], \"label\": \"Section-header\"}, {\"bbox_2d\": [310, 833, 754, 875], \"label\": \"Text\"}, {\"bbox_2d\": [572, 883, 587, 893], \"label\": \"Page-footer\"}, {\"bbox_2d\": [81, 8, 187, 44], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [409, 122, 754, 136], \"label\": \"Page-header\"}, {\"bbox_2d\": [187, 122, 334, 136], \"label\": \"Page-header\"}, {\"bbox_2d\": [309, 259, 497, 290], \"label\": \"Title\"}, {\"bbox_2d\": [546, 294, 618, 309], \"label\": \"Title\"}, {\"bbox_2d\": [332, 319, 829, 340], \"label\": \"Text\"}, {\"bbox_2d\": [309, 352, 857, 378], \"label\": \"Text\"}, {\"bbox_2d\": [309, 402, 857, 427], \"label\": \"Text\"}, {\"bbox_2d\": [309, 465, 857, 490], \"label\": \"Text\"}, {\"bbox_2d\": [377, 491, 796, 503], \"label\": \"Text\"}, {\"bbox_2d\": [343, 504, 858, 529], \"label\": \"Text\"}, {\"bbox_2d\": [326, 546, 546, 558], \"label\": \"Text\"}, {\"bbox_2d\": [199, 319, 285, 329], \"label\": \"Text\"}, {\"bbox_2d\": [209, 334, 275, 344], \"label\": \"Text\"}, {\"bbox_2d\": [309, 386, 506, 395], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 435, 854, 458], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 817, 612, 827], \"label\": \"Footnote\"}, {\"bbox_2d\": [309, 833, 753, 843], \"label\": \"Footnote\"}, {\"bbox_2d\": [309, 843, 610, 874], \"label\": \"Footnote\"}, {\"bbox_2d\": [571, 880, 593, 896], \"label\": \"Page-footer\"}, {\"bbox_2d\": [187, 367, 297, 440], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 25, 156, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [42, 84, 193, 94], \"label\": \"Section-header\"}, {\"bbox_2d\": [42, 214, 1009, 236], \"label\": \"Text\"}, {\"bbox_2d\": [42, 494, 1009, 516], \"label\": \"Text\"}, {\"bbox_2d\": [42, 526, 1009, 548], \"label\": \"Text\"}, {\"bbox_2d\": [508, 570, 525, 580], \"label\": \"Page-footer\"}, {\"bbox_2d\": [24, 114, 1011, 204], \"label\": \"Table\"}, {\"bbox_2d\": [24, 277, 1011, 483], \"label\": \"Table\"}, {\"bbox_2d\": [42, 250, 190, 260], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [43, 84, 192, 95], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 113, 1012, 202], \"label\": \"Table\"}, {\"bbox_2d\": [25, 214, 1009, 236], \"label\": \"Text\"}, {\"bbox_2d\": [43, 249, 190, 260], \"label\": \"Section-header\"}, {\"bbox_2d\": [22, 278, 1009, 481], \"label\": \"Table\"}, {\"bbox_2d\": [25, 494, 1009, 516], \"label\": \"Text\"}, {\"bbox_2d\": [25, 525, 1009, 547], \"label\": \"Text\"}, {\"bbox_2d\": [508, 570, 525, 581], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 139, 103, 154], \"label\": \"Page-header\"}, {\"bbox_2d\": [129, 134, 496, 154], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 233, 343, 246], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 255, 522, 517], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [91, 137, 104, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [129, 132, 497, 156], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 231, 343, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 253, 522, 517], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [61, 45, 220, 61], \"label\": \"Section-header\"}, {\"bbox_2d\": [992, 350, 1012, 721], \"label\": \"Page-header\"}, {\"bbox_2d\": [508, 994, 532, 1005], \"label\": \"Page-footer\"}, {\"bbox_2d\": [58, 74, 504, 814], \"label\": \"Table\"}, {\"bbox_2d\": [524, 71, 897, 504], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [60, 41, 221, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 70, 975, 790], \"label\": \"Table\"}, {\"bbox_2d\": [995, 344, 1014, 713], \"label\": \"Page-header\"}, {\"bbox_2d\": [508, 990, 527, 1011], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 93, 176, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 302, 312, 313], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 366, 461, 392], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 559, 225, 572], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 702, 267, 714], \"label\": \"Section-header\"}, {\"bbox_2d\": [583, 999, 946, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 108, 507, 149], \"label\": \"Text\"}, {\"bbox_2d\": [91, 160, 507, 287], \"label\": \"Text\"}, {\"bbox_2d\": [91, 325, 507, 350], \"label\": \"Text\"}, {\"bbox_2d\": [91, 396, 507, 479], \"label\": \"Text\"}, {\"bbox_2d\": [91, 490, 507, 544], \"label\": \"Text\"}, {\"bbox_2d\": [91, 575, 507, 687], \"label\": \"Text\"}, {\"bbox_2d\": [91, 717, 507, 858], \"label\": \"Text\"}, {\"bbox_2d\": [1004, 95, 1014, 338], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [1002, 94, 1013, 343], \"label\": \"Page-header\"}, {\"bbox_2d\": [584, 1000, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 95, 177, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 303, 313, 313], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 367, 462, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 561, 226, 571], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 703, 268, 713], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 718, 507, 857], \"label\": \"Text\"}, {\"bbox_2d\": [91, 575, 507, 686], \"label\": \"Text\"}, {\"bbox_2d\": [91, 490, 507, 543], \"label\": \"Text\"}, {\"bbox_2d\": [91, 396, 507, 478], \"label\": \"Text\"}, {\"bbox_2d\": [91, 325, 507, 350], \"label\": \"Text\"}, {\"bbox_2d\": [91, 160, 507, 286], \"label\": \"Text\"}, {\"bbox_2d\": [91, 109, 507, 148], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 384], \"label\": \"Text\"}, {\"bbox_2d\": [121, 395, 911, 904], \"label\": \"Text\"}, {\"bbox_2d\": [121, 915, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 384], \"label\": \"Text\"}, {\"bbox_2d\": [121, 396, 911, 904], \"label\": \"Text\"}, {\"bbox_2d\": [121, 916, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [670, 70, 915, 253], \"label\": \"Picture\"}, {\"bbox_2d\": [77, 69, 656, 162], \"label\": \"Text\"}, {\"bbox_2d\": [77, 177, 656, 242], \"label\": \"Text\"}, {\"bbox_2d\": [77, 256, 915, 306], \"label\": \"Text\"}, {\"bbox_2d\": [77, 320, 915, 358], \"label\": \"Text\"}, {\"bbox_2d\": [77, 373, 915, 492], \"label\": \"Text\"}, {\"bbox_2d\": [77, 507, 151, 514], \"label\": \"Text\"}, {\"bbox_2d\": [77, 671, 915, 724], \"label\": \"Text\"}, {\"bbox_2d\": [77, 736, 915, 800], \"label\": \"Text\"}, {\"bbox_2d\": [77, 816, 915, 882], \"label\": \"Text\"}, {\"bbox_2d\": [77, 896, 915, 934], \"label\": \"Text\"}, {\"bbox_2d\": [967, 982, 981, 994], \"label\": \"Page-footer\"}, {\"bbox_2d\": [77, 501, 921, 652], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [76, 70, 657, 163], \"label\": \"Text\"}, {\"bbox_2d\": [76, 176, 657, 242], \"label\": \"Text\"}, {\"bbox_2d\": [76, 255, 914, 307], \"label\": \"Text\"}, {\"bbox_2d\": [76, 321, 914, 359], \"label\": \"Text\"}, {\"bbox_2d\": [76, 372, 914, 493], \"label\": \"Text\"}, {\"bbox_2d\": [76, 670, 914, 722], \"label\": \"Text\"}, {\"bbox_2d\": [76, 735, 914, 802], \"label\": \"Text\"}, {\"bbox_2d\": [76, 815, 914, 881], \"label\": \"Text\"}, {\"bbox_2d\": [76, 895, 914, 933], \"label\": \"Text\"}, {\"bbox_2d\": [966, 982, 981, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [675, 92, 915, 254], \"label\": \"Picture\"}, {\"bbox_2d\": [693, 78, 861, 89], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 503, 580, 650], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [439, 73, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [695, 93, 728, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [301, 93, 333, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [185, 113, 444, 551], \"label\": \"Picture\"}, {\"bbox_2d\": [131, 569, 503, 709], \"label\": \"Text\"}, {\"bbox_2d\": [130, 716, 501, 801], \"label\": \"Text\"}, {\"bbox_2d\": [526, 699, 896, 879], \"label\": \"Text\"}, {\"bbox_2d\": [526, 883, 896, 982], \"label\": \"Text\"}, {\"bbox_2d\": [131, 947, 501, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 591, 896, 699], \"label\": \"Text\"}, {\"bbox_2d\": [526, 425, 896, 586], \"label\": \"Text\"}, {\"bbox_2d\": [526, 331, 896, 423], \"label\": \"Text\"}, {\"bbox_2d\": [526, 188, 895, 330], \"label\": \"Text\"}, {\"bbox_2d\": [526, 113, 895, 187], \"label\": \"Text\"}, {\"bbox_2d\": [282, 113, 350, 123], \"label\": \"Text\"}, {\"bbox_2d\": [475, 832, 495, 841], \"label\": \"Text\"}, {\"bbox_2d\": [259, 839, 349, 925], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 287, 895, 306], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [438, 73, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [299, 93, 331, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [694, 93, 726, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [186, 110, 442, 548], \"label\": \"Picture\"}, {\"bbox_2d\": [130, 568, 520, 665], \"label\": \"Text\"}, {\"bbox_2d\": [130, 667, 501, 802], \"label\": \"Text\"}, {\"bbox_2d\": [525, 110, 894, 186], \"label\": \"Text\"}, {\"bbox_2d\": [524, 189, 894, 329], \"label\": \"Text\"}, {\"bbox_2d\": [524, 332, 895, 617], \"label\": \"Text\"}, {\"bbox_2d\": [524, 619, 895, 694], \"label\": \"Text\"}, {\"bbox_2d\": [525, 698, 895, 878], \"label\": \"Text\"}, {\"bbox_2d\": [526, 881, 895, 982], \"label\": \"Text\"}, {\"bbox_2d\": [258, 831, 500, 925], \"label\": \"Picture\"}, {\"bbox_2d\": [130, 946, 500, 982], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [795, 37, 975, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [998, 207, 1013, 339], \"label\": \"Page-header\"}, {\"bbox_2d\": [79, 192, 181, 203], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 208, 938, 230], \"label\": \"Text\"}, {\"bbox_2d\": [79, 568, 942, 590], \"label\": \"Text\"}, {\"bbox_2d\": [79, 553, 158, 562], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 664, 119, 675], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 782, 292, 791], \"label\": \"Section-header\"}, {\"bbox_2d\": [942, 994, 991, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [79, 682, 135, 692], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 418, 292, 428], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 321, 135, 331], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 301, 120, 311], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 337, 161, 406], \"label\": \"Text\"}, {\"bbox_2d\": [79, 436, 165, 534], \"label\": \"Text\"}, {\"bbox_2d\": [279, 261, 941, 533], \"label\": \"Table\"}, {\"bbox_2d\": [79, 624, 956, 898], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [795, 34, 976, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 190, 181, 202], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 551, 159, 563], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 207, 942, 231], \"label\": \"Text\"}, {\"bbox_2d\": [78, 568, 946, 592], \"label\": \"Text\"}, {\"bbox_2d\": [77, 240, 958, 539], \"label\": \"Table\"}, {\"bbox_2d\": [78, 601, 957, 899], \"label\": \"Table\"}, {\"bbox_2d\": [1002, 206, 1018, 334], \"label\": \"Page-header\"}, {\"bbox_2d\": [943, 991, 991, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 905, 136], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 154, 927, 586], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 590, 628, 603], \"label\": \"Caption\"}, {\"bbox_2d\": [671, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 911, 136], \"label\": \"List-item\"}, {\"bbox_2d\": [251, 155, 929, 588], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 591, 627, 601], \"label\": \"Caption\"}, {\"bbox_2d\": [670, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [120, 100, 356, 119], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 149, 432, 163], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 165, 523, 240], \"label\": \"Text\"}, {\"bbox_2d\": [120, 243, 523, 381], \"label\": \"Text\"}, {\"bbox_2d\": [565, 149, 967, 238], \"label\": \"Text\"}, {\"bbox_2d\": [565, 275, 967, 363], \"label\": \"Text\"}, {\"bbox_2d\": [565, 398, 967, 502], \"label\": \"Text\"}, {\"bbox_2d\": [120, 417, 523, 506], \"label\": \"Text\"}, {\"bbox_2d\": [120, 510, 523, 647], \"label\": \"Text\"}, {\"bbox_2d\": [120, 650, 523, 678], \"label\": \"Text\"}, {\"bbox_2d\": [120, 697, 523, 772], \"label\": \"Text\"}, {\"bbox_2d\": [120, 807, 523, 961], \"label\": \"Text\"}, {\"bbox_2d\": [565, 534, 933, 621], \"label\": \"Text\"}, {\"bbox_2d\": [565, 383, 850, 395], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 399, 444, 412], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 257, 831, 270], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 639, 751, 650], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 757, 792, 768], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 791, 401, 803], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 657, 931, 737], \"label\": \"Text\"}, {\"bbox_2d\": [565, 773, 931, 853], \"label\": \"Text\"}, {\"bbox_2d\": [863, 1001, 975, 1014], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [863, 1001, 977, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [120, 101, 356, 118], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 149, 431, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 258, 831, 268], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 383, 850, 394], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 400, 444, 411], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 792, 402, 803], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 698, 386, 709], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 166, 522, 239], \"label\": \"Text\"}, {\"bbox_2d\": [120, 244, 523, 380], \"label\": \"Text\"}, {\"bbox_2d\": [120, 417, 522, 505], \"label\": \"Text\"}, {\"bbox_2d\": [120, 511, 522, 646], \"label\": \"Text\"}, {\"bbox_2d\": [120, 652, 522, 677], \"label\": \"Text\"}, {\"bbox_2d\": [565, 149, 968, 237], \"label\": \"Text\"}, {\"bbox_2d\": [565, 274, 968, 362], \"label\": \"Text\"}, {\"bbox_2d\": [565, 399, 968, 503], \"label\": \"Text\"}, {\"bbox_2d\": [120, 714, 523, 771], \"label\": \"Text\"}, {\"bbox_2d\": [120, 808, 523, 959], \"label\": \"Text\"}, {\"bbox_2d\": [579, 534, 933, 617], \"label\": \"Text\"}, {\"bbox_2d\": [579, 641, 751, 650], \"label\": \"Text\"}, {\"bbox_2d\": [579, 759, 793, 767], \"label\": \"Text\"}, {\"bbox_2d\": [598, 774, 932, 853], \"label\": \"Table\"}, {\"bbox_2d\": [598, 656, 933, 736], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 57, 1009, 83], \"label\": \"Text\"}, {\"bbox_2d\": [25, 88, 1009, 147], \"label\": \"Text\"}, {\"bbox_2d\": [25, 152, 1009, 198], \"label\": \"Text\"}, {\"bbox_2d\": [25, 203, 1009, 240], \"label\": \"Text\"}, {\"bbox_2d\": [25, 271, 1009, 293], \"label\": \"Text\"}, {\"bbox_2d\": [25, 250, 245, 264], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 323, 525, 334], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 60, 1009, 82], \"label\": \"Text\"}, {\"bbox_2d\": [25, 90, 1009, 146], \"label\": \"Text\"}, {\"bbox_2d\": [25, 153, 1009, 198], \"label\": \"Text\"}, {\"bbox_2d\": [25, 205, 1009, 238], \"label\": \"Text\"}, {\"bbox_2d\": [25, 271, 1009, 293], \"label\": \"Text\"}, {\"bbox_2d\": [25, 253, 235, 263], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 325, 525, 336], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [521, 762, 967, 845], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 718, 257, 735], \"label\": \"Section-header\"}, {\"bbox_2d\": [521, 861, 960, 918], \"label\": \"Text\"}, {\"bbox_2d\": [521, 927, 955, 1000], \"label\": \"Text\"}, {\"bbox_2d\": [267, 999, 387, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [61, 999, 264, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [62, 790, 403, 953], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [60, 718, 257, 734], \"label\": \"Section-header\"}, {\"bbox_2d\": [521, 762, 971, 845], \"label\": \"Section-header\"}, {\"bbox_2d\": [521, 861, 963, 917], \"label\": \"Text\"}, {\"bbox_2d\": [521, 928, 960, 999], \"label\": \"Text\"}, {\"bbox_2d\": [60, 1000, 388, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [77, 803, 152, 828], \"label\": \"Text\"}, {\"bbox_2d\": [203, 803, 261, 828], \"label\": \"Text\"}, {\"bbox_2d\": [308, 803, 390, 828], \"label\": \"Text\"}, {\"bbox_2d\": [306, 859, 389, 884], \"label\": \"Text\"}, {\"bbox_2d\": [308, 920, 389, 936], \"label\": \"Text\"}, {\"bbox_2d\": [204, 916, 263, 941], \"label\": \"Text\"}, {\"bbox_2d\": [205, 859, 261, 884], \"label\": \"Text\"}, {\"bbox_2d\": [72, 859, 160, 884], \"label\": \"Text\"}, {\"bbox_2d\": [67, 916, 165, 941], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [172, 74, 937, 106], \"label\": \"Text\"}, {\"bbox_2d\": [172, 151, 740, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 173, 775, 189], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 113, 937, 141], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 196, 937, 225], \"label\": \"Text\"}, {\"bbox_2d\": [98, 235, 937, 310], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 319, 937, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 373, 937, 416], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 426, 538, 438], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 448, 484, 463], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 469, 779, 484], \"label\": \"Text\"}, {\"bbox_2d\": [98, 596, 937, 640], \"label\": \"Text\"}, {\"bbox_2d\": [98, 649, 937, 677], \"label\": \"Text\"}, {\"bbox_2d\": [98, 708, 937, 736], \"label\": \"Text\"}, {\"bbox_2d\": [98, 745, 937, 834], \"label\": \"Text\"}, {\"bbox_2d\": [98, 844, 937, 872], \"label\": \"Text\"}, {\"bbox_2d\": [98, 879, 937, 922], \"label\": \"Text\"}, {\"bbox_2d\": [98, 932, 640, 946], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 494, 923, 508], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 515, 775, 529], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 536, 783, 550], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 557, 937, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 686, 816, 699], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 745, 937, 867], \"label\": \"Text\"}, {\"bbox_2d\": [989, 960, 937, 1000], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [172, 71, 942, 106], \"label\": \"Text\"}, {\"bbox_2d\": [98, 109, 943, 144], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 147, 746, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 170, 779, 190], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 230, 951, 311], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 314, 946, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 367, 946, 403], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 398, 319, 418], \"label\": \"Text\"}, {\"bbox_2d\": [172, 192, 943, 228], \"label\": \"Text\"}, {\"bbox_2d\": [147, 421, 543, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 443, 489, 463], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 927, 646, 947], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 681, 821, 701], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 465, 785, 485], \"label\": \"Text\"}, {\"bbox_2d\": [98, 487, 927, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 532, 790, 552], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 510, 781, 530], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 554, 942, 589], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 592, 946, 641], \"label\": \"Text\"}, {\"bbox_2d\": [98, 644, 945, 679], \"label\": \"Text\"}, {\"bbox_2d\": [98, 703, 945, 738], \"label\": \"Text\"}, {\"bbox_2d\": [98, 741, 945, 835], \"label\": \"Text\"}, {\"bbox_2d\": [98, 837, 946, 872], \"label\": \"Text\"}, {\"bbox_2d\": [98, 875, 947, 925], \"label\": \"Text\"}, {\"bbox_2d\": [895, 956, 942, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [98, 74, 503, 94], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 110, 941, 202], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 218, 941, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 289, 941, 345], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 360, 941, 400], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 415, 941, 455], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 470, 941, 545], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 559, 941, 616], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 631, 941, 671], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 686, 941, 725], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 740, 447, 761], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 776, 513, 797], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 812, 941, 905], \"label\": \"List-item\"}, {\"bbox_2d\": [930, 955, 953, 972], \"label\": \"Page-footer\"}, {\"bbox_2d\": [84, 975, 182, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [98, 73, 501, 94], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 110, 946, 201], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 218, 943, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 290, 943, 346], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 362, 947, 400], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 416, 944, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 470, 946, 544], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 560, 945, 616], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 632, 944, 670], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 686, 945, 724], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 740, 452, 760], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 812, 944, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 776, 516, 797], \"label\": \"Section-header\"}, {\"bbox_2d\": [913, 955, 959, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [84, 974, 187, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 48, 296, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 125, 918, 176], \"label\": \"Text\"}, {\"bbox_2d\": [121, 176, 918, 259], \"label\": \"Text\"}, {\"bbox_2d\": [121, 441, 918, 540], \"label\": \"Text\"}, {\"bbox_2d\": [121, 640, 917, 689], \"label\": \"Text\"}, {\"bbox_2d\": [121, 690, 917, 789], \"label\": \"Text\"}, {\"bbox_2d\": [121, 789, 276, 804], \"label\": \"Text\"}, {\"bbox_2d\": [147, 804, 917, 887], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 888, 901, 921], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 375, 918, 440], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 276, 917, 375], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 972, 440, 988], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 275, 945, 374], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 375, 951, 440], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 441, 954, 540], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 540, 953, 639], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 805, 950, 887], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 888, 904, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 639, 934, 689], \"label\": \"Text\"}, {\"bbox_2d\": [121, 689, 952, 788], \"label\": \"Text\"}, {\"bbox_2d\": [182, 788, 277, 804], \"label\": \"Text\"}, {\"bbox_2d\": [121, 126, 949, 176], \"label\": \"Text\"}, {\"bbox_2d\": [121, 176, 953, 258], \"label\": \"Text\"}, {\"bbox_2d\": [182, 259, 558, 275], \"label\": \"Text\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 92, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [789, 73, 893, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [677, 157, 746, 166], \"label\": \"Text\"}, {\"bbox_2d\": [282, 157, 351, 166], \"label\": \"Text\"}, {\"bbox_2d\": [157, 168, 351, 972], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 170, 809, 969], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [130, 74, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 93, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [790, 74, 893, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [160, 153, 374, 970], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 151, 808, 980], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [74, 119, 476, 137], \"label\": \"Section-header\"}, {\"bbox_2d\": [74, 157, 360, 180], \"label\": \"Text\"}, {\"bbox_2d\": [72, 208, 916, 781], \"label\": \"Table\"}, {\"bbox_2d\": [51, 1000, 223, 1012], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [74, 160, 360, 180], \"label\": \"Text\"}, {\"bbox_2d\": [74, 123, 478, 134], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 207, 919, 782], \"label\": \"Table\"}, {\"bbox_2d\": [12, 770, 22, 860], \"label\": \"Page-header\"}, {\"bbox_2d\": [51, 1002, 223, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [157, 164, 962, 209], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 219, 962, 265], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 275, 962, 319], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 351, 390, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 375, 572, 389], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 399, 433, 413], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 423, 530, 437], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 447, 358, 461], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 471, 609, 485], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 524, 935, 538], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 548, 962, 578], \"label\": \"Text\"}, {\"bbox_2d\": [125, 635, 962, 695], \"label\": \"Text\"}, {\"bbox_2d\": [125, 775, 962, 805], \"label\": \"Text\"}, {\"bbox_2d\": [125, 815, 962, 875], \"label\": \"Text\"}, {\"bbox_2d\": [125, 885, 962, 915], \"label\": \"Text\"}, {\"bbox_2d\": [125, 925, 962, 950], \"label\": \"Text\"}, {\"bbox_2d\": [125, 729, 377, 748], \"label\": \"Section-header\"}, {\"bbox_2d\": [142, 976, 188, 999], \"label\": \"Picture\"}, {\"bbox_2d\": [459, 988, 630, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [867, 988, 960, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [141, 609, 429, 628], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 494, 192, 507], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 528, 935, 538], \"label\": \"Section-header\"}, {\"bbox_2d\": [157, 21, 373, 114], \"label\": \"Picture\"}, {\"bbox_2d\": [630, 124, 960, 134], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 327, 810, 341], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [138, 56, 377, 120], \"label\": \"Picture\"}, {\"bbox_2d\": [628, 122, 957, 134], \"label\": \"Page-header\"}, {\"bbox_2d\": [156, 162, 967, 210], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 348, 396, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 372, 577, 389], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 396, 437, 413], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 420, 535, 436], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 444, 363, 460], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 468, 614, 484], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 522, 941, 538], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 607, 437, 628], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 724, 386, 746], \"label\": \"Section-header\"}, {\"bbox_2d\": [458, 987, 631, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [867, 987, 966, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [136, 972, 194, 998], \"label\": \"Picture\"}, {\"bbox_2d\": [125, 920, 967, 951], \"label\": \"Text\"}, {\"bbox_2d\": [125, 881, 967, 912], \"label\": \"Text\"}, {\"bbox_2d\": [125, 812, 967, 874], \"label\": \"Text\"}, {\"bbox_2d\": [125, 773, 967, 804], \"label\": \"Text\"}, {\"bbox_2d\": [125, 632, 967, 694], \"label\": \"Text\"}, {\"bbox_2d\": [125, 546, 966, 577], \"label\": \"Text\"}, {\"bbox_2d\": [125, 492, 192, 507], \"label\": \"Text\"}, {\"bbox_2d\": [125, 326, 816, 341], \"label\": \"Text\"}, {\"bbox_2d\": [125, 217, 967, 264], \"label\": \"Text\"}, {\"bbox_2d\": [125, 271, 967, 318], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [75, 102, 364, 115], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 117, 124, 130], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 233, 142, 246], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 406, 300, 418], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 117, 600, 130], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 134, 462, 209], \"label\": \"Text\"}, {\"bbox_2d\": [517, 134, 906, 209], \"label\": \"Text\"}, {\"bbox_2d\": [72, 249, 464, 355], \"label\": \"Text\"}, {\"bbox_2d\": [517, 212, 910, 272], \"label\": \"Text\"}, {\"bbox_2d\": [72, 443, 915, 551], \"label\": \"Table\"}, {\"bbox_2d\": [60, 999, 170, 1016], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [517, 119, 600, 131], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 104, 364, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 119, 123, 131], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 235, 142, 246], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 407, 300, 418], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 136, 467, 210], \"label\": \"Text\"}, {\"bbox_2d\": [72, 252, 470, 356], \"label\": \"Text\"}, {\"bbox_2d\": [517, 136, 911, 210], \"label\": \"Text\"}, {\"bbox_2d\": [517, 215, 915, 272], \"label\": \"Text\"}, {\"bbox_2d\": [69, 425, 915, 549], \"label\": \"Table\"}, {\"bbox_2d\": [58, 1001, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 28, 438, 60], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 65, 627, 77], \"label\": \"Text\"}, {\"bbox_2d\": [85, 95, 427, 108], \"label\": \"Text\"}, {\"bbox_2d\": [85, 266, 966, 294], \"label\": \"Text\"}, {\"bbox_2d\": [85, 324, 933, 369], \"label\": \"Text\"}, {\"bbox_2d\": [85, 385, 961, 492], \"label\": \"Text\"}, {\"bbox_2d\": [85, 523, 960, 584], \"label\": \"Text\"}, {\"bbox_2d\": [85, 507, 334, 520], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 309, 477, 321], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 120, 963, 245], \"label\": \"Table\"}, {\"bbox_2d\": [85, 597, 964, 810], \"label\": \"Table\"}, {\"bbox_2d\": [85, 827, 932, 856], \"label\": \"Text\"}, {\"bbox_2d\": [85, 871, 915, 900], \"label\": \"Text\"}, {\"bbox_2d\": [802, 997, 964, 1005], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [85, 22, 210, 45], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 41, 439, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 304, 477, 323], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 502, 333, 521], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 869, 919, 900], \"label\": \"Text\"}, {\"bbox_2d\": [85, 825, 936, 856], \"label\": \"Text\"}, {\"bbox_2d\": [85, 521, 966, 583], \"label\": \"Text\"}, {\"bbox_2d\": [85, 382, 966, 492], \"label\": \"Text\"}, {\"bbox_2d\": [85, 322, 937, 369], \"label\": \"Text\"}, {\"bbox_2d\": [85, 263, 970, 294], \"label\": \"Text\"}, {\"bbox_2d\": [85, 92, 428, 107], \"label\": \"Text\"}, {\"bbox_2d\": [85, 62, 628, 78], \"label\": \"Text\"}, {\"bbox_2d\": [83, 116, 967, 249], \"label\": \"Table\"}, {\"bbox_2d\": [85, 594, 966, 811], \"label\": \"Table\"}, {\"bbox_2d\": [802, 996, 965, 1006], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 43, 230, 58], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 211, 149, 225], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 453, 143, 466], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 699, 335, 712], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 59, 953, 121], \"label\": \"Text\"}, {\"bbox_2d\": [85, 135, 961, 197], \"label\": \"Text\"}, {\"bbox_2d\": [85, 227, 945, 319], \"label\": \"Text\"}, {\"bbox_2d\": [85, 334, 946, 378], \"label\": \"Text\"}, {\"bbox_2d\": [85, 393, 953, 438], \"label\": \"Text\"}, {\"bbox_2d\": [85, 468, 959, 545], \"label\": \"Text\"}, {\"bbox_2d\": [85, 560, 963, 685], \"label\": \"Text\"}, {\"bbox_2d\": [85, 716, 964, 856], \"label\": \"Text\"}, {\"bbox_2d\": [85, 871, 964, 915], \"label\": \"Text\"}, {\"bbox_2d\": [801, 999, 964, 1008], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [802, 998, 965, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 869, 969, 916], \"label\": \"Text\"}, {\"bbox_2d\": [85, 713, 969, 855], \"label\": \"Text\"}, {\"bbox_2d\": [85, 694, 334, 713], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 558, 968, 684], \"label\": \"Text\"}, {\"bbox_2d\": [85, 466, 963, 545], \"label\": \"Text\"}, {\"bbox_2d\": [85, 448, 145, 467], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 391, 957, 438], \"label\": \"Text\"}, {\"bbox_2d\": [85, 331, 951, 378], \"label\": \"Text\"}, {\"bbox_2d\": [85, 224, 952, 319], \"label\": \"Text\"}, {\"bbox_2d\": [85, 206, 149, 225], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 133, 965, 196], \"label\": \"Text\"}, {\"bbox_2d\": [85, 58, 957, 120], \"label\": \"Text\"}, {\"bbox_2d\": [85, 39, 229, 58], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [52, 53, 646, 75], \"label\": \"Page-header\"}, {\"bbox_2d\": [52, 83, 285, 95], \"label\": \"Text\"}, {\"bbox_2d\": [52, 138, 281, 151], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 182, 314, 192], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 201, 953, 250], \"label\": \"Text\"}, {\"bbox_2d\": [77, 259, 961, 283], \"label\": \"Text\"}, {\"bbox_2d\": [77, 292, 970, 356], \"label\": \"Text\"}, {\"bbox_2d\": [77, 365, 981, 402], \"label\": \"Text\"}, {\"bbox_2d\": [77, 409, 337, 420], \"label\": \"Text\"}, {\"bbox_2d\": [77, 801, 968, 825], \"label\": \"Text\"}, {\"bbox_2d\": [77, 834, 953, 858], \"label\": \"Text\"}, {\"bbox_2d\": [103, 935, 805, 945], \"label\": \"Text\"}, {\"bbox_2d\": [42, 1003, 220, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [77, 865, 974, 928], \"label\": \"Table\"}, {\"bbox_2d\": [49, 424, 978, 797], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [51, 161, 404, 172], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 181, 314, 191], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 201, 953, 251], \"label\": \"Text\"}, {\"bbox_2d\": [77, 260, 961, 284], \"label\": \"Text\"}, {\"bbox_2d\": [77, 292, 971, 356], \"label\": \"Text\"}, {\"bbox_2d\": [77, 364, 982, 401], \"label\": \"Text\"}, {\"bbox_2d\": [77, 409, 338, 420], \"label\": \"Text\"}, {\"bbox_2d\": [77, 802, 970, 825], \"label\": \"Text\"}, {\"bbox_2d\": [77, 834, 954, 858], \"label\": \"Text\"}, {\"bbox_2d\": [103, 934, 806, 945], \"label\": \"Text\"}, {\"bbox_2d\": [50, 426, 979, 797], \"label\": \"Table\"}, {\"bbox_2d\": [51, 862, 981, 928], \"label\": \"Table\"}, {\"bbox_2d\": [33, 1003, 220, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [51, 140, 282, 151], \"label\": \"Section-header\"}, {\"bbox_2d\": [51, 56, 646, 94], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [81, 53, 292, 93], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 749, 322, 759], \"label\": \"Text\"}, {\"bbox_2d\": [81, 1004, 246, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [79, 90, 958, 735], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [81, 1005, 246, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [81, 748, 320, 757], \"label\": \"Text\"}, {\"bbox_2d\": [73, 89, 961, 738], \"label\": \"Table\"}, {\"bbox_2d\": [81, 54, 291, 93], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [101, 978, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [547, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 45, 208, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 987, 687, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 91, 926, 188], \"label\": \"Table\"}, {\"bbox_2d\": [231, 203, 926, 298], \"label\": \"Table\"}, {\"bbox_2d\": [231, 314, 926, 376], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [109, 45, 208, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [546, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [223, 91, 931, 382], \"label\": \"Table\"}, {\"bbox_2d\": [109, 987, 688, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [161, 94, 871, 275], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 288, 911, 386], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 418, 911, 596], \"label\": \"Text\"}, {\"bbox_2d\": [121, 609, 911, 841], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [159, 91, 873, 259], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 287, 911, 386], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 416, 911, 596], \"label\": \"Text\"}, {\"bbox_2d\": [121, 608, 911, 842], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [367, 80, 671, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 145, 507, 176], \"label\": \"Text\"}, {\"bbox_2d\": [115, 216, 507, 322], \"label\": \"Text\"}, {\"bbox_2d\": [115, 359, 507, 429], \"label\": \"Text\"}, {\"bbox_2d\": [115, 468, 507, 847], \"label\": \"Text\"}, {\"bbox_2d\": [115, 886, 507, 936], \"label\": \"Text\"}, {\"bbox_2d\": [528, 800, 920, 916], \"label\": \"Text\"}, {\"bbox_2d\": [528, 691, 920, 788], \"label\": \"Text\"}, {\"bbox_2d\": [115, 867, 450, 876], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 673, 711, 682], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 449, 269, 458], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 341, 391, 350], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 370, 704, 380], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 145, 920, 254], \"label\": \"Text\"}, {\"bbox_2d\": [528, 262, 920, 350], \"label\": \"Text\"}, {\"bbox_2d\": [528, 389, 920, 555], \"label\": \"Text\"}, {\"bbox_2d\": [527, 569, 922, 653], \"label\": \"Table\"}, {\"bbox_2d\": [511, 999, 523, 1007], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [115, 194, 218, 204], \"label\": \"Section-header\"}, {\"bbox_2d\": [367, 79, 668, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [527, 369, 704, 379], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 340, 390, 350], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 447, 269, 457], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 671, 709, 681], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 866, 450, 876], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 144, 511, 175], \"label\": \"Text\"}, {\"bbox_2d\": [115, 212, 511, 321], \"label\": \"Text\"}, {\"bbox_2d\": [115, 358, 511, 428], \"label\": \"Text\"}, {\"bbox_2d\": [115, 466, 511, 847], \"label\": \"Text\"}, {\"bbox_2d\": [115, 885, 511, 935], \"label\": \"Text\"}, {\"bbox_2d\": [527, 144, 923, 253], \"label\": \"Text\"}, {\"bbox_2d\": [527, 261, 923, 350], \"label\": \"Text\"}, {\"bbox_2d\": [527, 388, 923, 555], \"label\": \"Text\"}, {\"bbox_2d\": [527, 690, 924, 915], \"label\": \"Text\"}, {\"bbox_2d\": [512, 999, 523, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [526, 569, 922, 653], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [429, 28, 613, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [106, 83, 940, 147], \"label\": \"Text\"}, {\"bbox_2d\": [106, 171, 879, 193], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 304, 392, 325], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 525, 825, 547], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 592, 587, 614], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 724, 321, 746], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 438, 928, 502], \"label\": \"Text\"}, {\"bbox_2d\": [106, 327, 675, 348], \"label\": \"Text\"}, {\"bbox_2d\": [106, 195, 952, 282], \"label\": \"Text\"}, {\"bbox_2d\": [106, 350, 597, 368], \"label\": \"List-item\"}, {\"bbox_2d\": [106, 372, 407, 391], \"label\": \"List-item\"}, {\"bbox_2d\": [106, 394, 427, 413], \"label\": \"List-item\"}, {\"bbox_2d\": [106, 617, 955, 702], \"label\": \"Text\"}, {\"bbox_2d\": [106, 749, 941, 791], \"label\": \"Text\"}, {\"bbox_2d\": [106, 815, 954, 879], \"label\": \"Text\"}, {\"bbox_2d\": [106, 901, 955, 940], \"label\": \"Text\"}, {\"bbox_2d\": [438, 997, 601, 1007], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [106, 82, 946, 148], \"label\": \"Text\"}, {\"bbox_2d\": [106, 326, 678, 347], \"label\": \"Text\"}, {\"bbox_2d\": [106, 437, 939, 503], \"label\": \"Text\"}, {\"bbox_2d\": [106, 547, 678, 569], \"label\": \"Text\"}, {\"bbox_2d\": [106, 614, 961, 702], \"label\": \"Text\"}, {\"bbox_2d\": [106, 747, 949, 790], \"label\": \"Text\"}, {\"bbox_2d\": [106, 813, 961, 879], \"label\": \"Text\"}, {\"bbox_2d\": [106, 900, 964, 940], \"label\": \"Text\"}, {\"bbox_2d\": [438, 996, 602, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [106, 724, 321, 746], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 591, 589, 613], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 525, 826, 547], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 303, 393, 325], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 193, 952, 281], \"label\": \"Text\"}, {\"bbox_2d\": [106, 170, 879, 192], \"label\": \"Section-header\"}, {\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [106, 348, 598, 370], \"label\": \"List-item\"}, {\"bbox_2d\": [106, 370, 409, 392], \"label\": \"List-item\"}, {\"bbox_2d\": [106, 392, 428, 414], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 107, 910, 176], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 401, 911, 523], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [299, 178, 734, 381], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [121, 107, 910, 185], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 400, 911, 525], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [298, 186, 733, 381], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [526, 37, 546, 48], \"label\": \"Page-header\"}, {\"bbox_2d\": [116, 75, 401, 92], \"label\": \"Text\"}, {\"bbox_2d\": [116, 95, 961, 134], \"label\": \"Text\"}, {\"bbox_2d\": [116, 138, 958, 170], \"label\": \"Text\"}, {\"bbox_2d\": [116, 175, 961, 232], \"label\": \"Text\"}, {\"bbox_2d\": [116, 235, 959, 271], \"label\": \"Text\"}, {\"bbox_2d\": [116, 275, 961, 362], \"label\": \"Text\"}, {\"bbox_2d\": [116, 365, 961, 436], \"label\": \"Text\"}, {\"bbox_2d\": [116, 440, 961, 519], \"label\": \"Text\"}, {\"bbox_2d\": [116, 522, 959, 580], \"label\": \"Text\"}, {\"bbox_2d\": [116, 584, 800, 600], \"label\": \"Text\"}, {\"bbox_2d\": [116, 604, 961, 683], \"label\": \"Text\"}, {\"bbox_2d\": [116, 686, 961, 723], \"label\": \"Text\"}, {\"bbox_2d\": [116, 726, 961, 821], \"label\": \"Text\"}, {\"bbox_2d\": [116, 824, 961, 920], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [528, 40, 545, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [116, 75, 392, 92], \"label\": \"Text\"}, {\"bbox_2d\": [115, 96, 960, 133], \"label\": \"Text\"}, {\"bbox_2d\": [116, 136, 959, 171], \"label\": \"Text\"}, {\"bbox_2d\": [116, 175, 960, 231], \"label\": \"Text\"}, {\"bbox_2d\": [116, 234, 959, 270], \"label\": \"Text\"}, {\"bbox_2d\": [115, 273, 960, 388], \"label\": \"Text\"}, {\"bbox_2d\": [115, 391, 959, 487], \"label\": \"Text\"}, {\"bbox_2d\": [162, 489, 878, 507], \"label\": \"Text\"}, {\"bbox_2d\": [115, 508, 959, 603], \"label\": \"Text\"}, {\"bbox_2d\": [161, 607, 794, 624], \"label\": \"Text\"}, {\"bbox_2d\": [116, 626, 959, 682], \"label\": \"Text\"}, {\"bbox_2d\": [116, 685, 959, 722], \"label\": \"Text\"}, {\"bbox_2d\": [115, 725, 960, 819], \"label\": \"Text\"}, {\"bbox_2d\": [115, 823, 960, 917], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [77, 57, 948, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 328, 503, 365], \"label\": \"Text\"}, {\"bbox_2d\": [116, 368, 503, 520], \"label\": \"Text\"}, {\"bbox_2d\": [116, 529, 503, 552], \"label\": \"Text\"}, {\"bbox_2d\": [116, 556, 503, 786], \"label\": \"Text\"}, {\"bbox_2d\": [116, 798, 503, 967], \"label\": \"Text\"}, {\"bbox_2d\": [556, 823, 948, 955], \"label\": \"Text\"}, {\"bbox_2d\": [556, 638, 948, 810], \"label\": \"Text\"}, {\"bbox_2d\": [556, 454, 948, 624], \"label\": \"Text\"}, {\"bbox_2d\": [556, 330, 948, 441], \"label\": \"Text\"}, {\"bbox_2d\": [116, 1001, 29, 1012], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [77, 57, 947, 311], \"label\": \"Text\"}, {\"bbox_2d\": [115, 327, 501, 518], \"label\": \"Text\"}, {\"bbox_2d\": [115, 531, 501, 781], \"label\": \"Text\"}, {\"bbox_2d\": [115, 794, 501, 965], \"label\": \"Text\"}, {\"bbox_2d\": [556, 328, 941, 439], \"label\": \"Text\"}, {\"bbox_2d\": [556, 453, 941, 624], \"label\": \"Text\"}, {\"bbox_2d\": [556, 637, 941, 808], \"label\": \"Text\"}, {\"bbox_2d\": [556, 821, 941, 953], \"label\": \"Text\"}, {\"bbox_2d\": [18, 1013, 25, 1024], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [225, 72, 372, 84], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 72, 951, 142], \"label\": \"Text\"}, {\"bbox_2d\": [91, 104, 506, 167], \"label\": \"Text\"}, {\"bbox_2d\": [536, 229, 716, 242], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 264, 951, 296], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 291, 896, 304], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 309, 951, 341], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 355, 951, 387], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 392, 951, 424], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 429, 816, 442], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 448, 951, 479], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 484, 951, 526], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 531, 951, 583], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 588, 951, 610], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 615, 951, 647], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 652, 951, 694], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 699, 951, 731], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 736, 951, 759], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 764, 951, 796], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 801, 951, 833], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 838, 951, 870], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 875, 951, 907], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 912, 951, 944], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 949, 951, 962], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 917, 506, 939], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 944, 506, 966], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 971, 506, 993], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 857, 506, 889], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 842, 506, 855], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 801, 506, 833], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 742, 506, 774], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 660, 506, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 627, 506, 659], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 593, 506, 625], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 539, 506, 581], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 497, 506, 530], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 451, 506, 493], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 396, 506, 448], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 341, 506, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 299, 506, 331], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 265, 506, 297], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 229, 506, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 195, 506, 227], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 132, 506, 183], \"label\": \"Text\"}, {\"bbox_2d\": [565, 229, 798, 251], \"label\": \"Text\"}, {\"bbox_2d\": [565, 256, 951, 304], \"label\": \"Text\"}, {\"bbox_2d\": [565, 309, 951, 361], \"label\": \"Text\"}, {\"bbox_2d\": [565, 366, 951, 408], \"label\": \"Text\"}, {\"bbox_2d\": [565, 413, 816, 426], \"label\": \"Text\"}, {\"bbox_2d\": [565, 431, 951, 479], \"label\": \"Text\"}, {\"bbox_2d\": [565, 484, 951, 526], \"label\": \"Text\"}, {\"bbox_2d\": [565, 531, 951, 583], \"label\": \"Text\"}, {\"bbox_2d\": [565, 588, 951, 610], \"label\": \"Text\"}, {\"bbox_2d\": [565, 615, 951, 647], \"label\": \"Text\"}, {\"bbox_2d\": [565, 652, 951, 694], \"label\": \"Text\"}, {\"bbox_2d\": [565, 699, 951, 731], \"label\": \"Text\"}, {\"bbox_2d\": [565, 736, 951, 759], \"label\": \"Text\"}, {\"bbox_2d\": [565, 764, 951, 796], \"label\": \"Text\"}, {\"bbox_2d\": [565, 801, 951, 833], \"label\": \"Text\"}, {\"bbox_2d\": [565, 838, 951, 870], \"label\": \"Text\"}, {\"bbox_2d\": [565, 875, 951, 907], \"label\": \"Text\"}, {\"bbox_2d\": [565, 912, 951, 944], \"label\": \"Text\"}, {\"bbox_2d\": [565", "label": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [224, 72, 373, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 104, 506, 162], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 144], \"label\": \"Text\"}, {\"bbox_2d\": [97, 222, 401, 235], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 235, 506, 249], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 249, 506, 290], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 290, 368, 304], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 304, 506, 345], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 345, 474, 358], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 359, 506, 399], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 400, 506, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [97, 454, 506, 495], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 495, 506, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 536, 506, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 564, 506, 618], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 619, 506, 646], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 646, 437, 659], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 660, 506, 714], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 714, 506, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 742, 506, 796], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 796, 506, 824], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 824, 444, 837], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 837, 506, 865], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 865, 506, 892], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 892, 506, 933], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 933, 505, 947], \"label\": \"List-item\"}, {\"bbox_2d\": [556, 223, 801, 235], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 235, 951, 263], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 263, 951, 290], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 290, 951, 317], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 317, 951, 358], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 359, 951, 386], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 386, 865, 399], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 400, 951, 427], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 427, 900, 441], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 441, 840, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 454, 951, 482], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 482, 951, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 536, 951, 577], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 578, 951, 605], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 605, 951, 632], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 632, 951, 659], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 660, 951, 700], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 701, 951, 728], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 728, 872, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 742, 892, 755], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 755, 951, 783], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 783, 951, 810], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 810, 951, 837], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 837, 951, 865], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 865, 950, 878], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 879, 951, 906], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 906, 951, 933], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 933, 951, 947], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [57, 169, 394, 180], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 188, 108, 199], \"label\": \"Text\"}, {\"bbox_2d\": [57, 500, 257, 606], \"label\": \"Text\"}, {\"bbox_2d\": [57, 619, 253, 821], \"label\": \"Text\"}, {\"bbox_2d\": [57, 834, 258, 926], \"label\": \"Text\"}, {\"bbox_2d\": [293, 485, 497, 526], \"label\": \"Text\"}, {\"bbox_2d\": [293, 539, 488, 598], \"label\": \"Text\"}, {\"bbox_2d\": [293, 611, 502, 844], \"label\": \"Text\"}, {\"bbox_2d\": [532, 618, 737, 676], \"label\": \"Text\"}, {\"bbox_2d\": [532, 689, 725, 779], \"label\": \"Text\"}, {\"bbox_2d\": [532, 530, 738, 603], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 474, 723, 515], \"label\": \"Text\"}, {\"bbox_2d\": [532, 291, 742, 460], \"label\": \"Text\"}, {\"bbox_2d\": [532, 188, 734, 277], \"label\": \"Text\"}, {\"bbox_2d\": [532, 169, 734, 180], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 467, 263, 495], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 467, 418, 478], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 424, 288, 434], \"label\": \"Picture\"}, {\"bbox_2d\": [57, 221, 503, 433], \"label\": \"Picture\"}, {\"bbox_2d\": [814, 1003, 1005, 1012], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [58, 221, 501, 435], \"label\": \"Picture\"}, {\"bbox_2d\": [57, 171, 394, 198], \"label\": \"Caption\"}, {\"bbox_2d\": [55, 468, 263, 495], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 468, 419, 478], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 500, 257, 606], \"label\": \"Text\"}, {\"bbox_2d\": [55, 620, 259, 822], \"label\": \"Text\"}, {\"bbox_2d\": [55, 835, 258, 925], \"label\": \"Text\"}, {\"bbox_2d\": [293, 484, 498, 526], \"label\": \"Text\"}, {\"bbox_2d\": [293, 539, 489, 598], \"label\": \"Text\"}, {\"bbox_2d\": [293, 611, 504, 845], \"label\": \"Text\"}, {\"bbox_2d\": [532, 170, 736, 181], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 186, 736, 276], \"label\": \"Text\"}, {\"bbox_2d\": [532, 290, 744, 460], \"label\": \"Text\"}, {\"bbox_2d\": [532, 473, 725, 516], \"label\": \"Text\"}, {\"bbox_2d\": [532, 530, 739, 540], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 546, 742, 604], \"label\": \"Text\"}, {\"bbox_2d\": [532, 617, 738, 676], \"label\": \"Text\"}, {\"bbox_2d\": [532, 689, 731, 779], \"label\": \"Text\"}, {\"bbox_2d\": [865, 1003, 1005, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [62, 62, 300, 91], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 100, 776, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [310, 954, 732, 970], \"label\": \"Caption\"}, {\"bbox_2d\": [507, 992, 539, 1005], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [64, 62, 312, 92], \"label\": \"Page-header\"}, {\"bbox_2d\": [112, 98, 867, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [310, 957, 731, 972], \"label\": \"Caption\"}, {\"bbox_2d\": [509, 993, 541, 1007], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [101, 979, 369, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 978, 370, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [90, 46, 420, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [90, 80, 771, 91], \"label\": \"Section-header\"}, {\"bbox_2d\": [90, 139, 337, 149], \"label\": \"Section-header\"}, {\"bbox_2d\": [90, 156, 951, 243], \"label\": \"Text\"}, {\"bbox_2d\": [90, 250, 951, 408], \"label\": \"Text\"}, {\"bbox_2d\": [90, 416, 951, 461], \"label\": \"Text\"}, {\"bbox_2d\": [90, 622, 604, 638], \"label\": \"Text\"}, {\"bbox_2d\": [90, 646, 697, 661], \"label\": \"Text\"}, {\"bbox_2d\": [90, 670, 944, 698], \"label\": \"Text\"}, {\"bbox_2d\": [90, 708, 730, 722], \"label\": \"Text\"}, {\"bbox_2d\": [90, 893, 951, 934], \"label\": \"Text\"}, {\"bbox_2d\": [511, 988, 529, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [90, 476, 931, 616], \"label\": \"Table\"}, {\"bbox_2d\": [90, 734, 932, 866], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [91, 47, 426, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 141, 338, 149], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 156, 945, 239], \"label\": \"Text\"}, {\"bbox_2d\": [91, 251, 951, 407], \"label\": \"Text\"}, {\"bbox_2d\": [91, 419, 944, 459], \"label\": \"Text\"}, {\"bbox_2d\": [90, 475, 943, 612], \"label\": \"Table\"}, {\"bbox_2d\": [91, 627, 603, 635], \"label\": \"Text\"}, {\"bbox_2d\": [91, 648, 696, 659], \"label\": \"Text\"}, {\"bbox_2d\": [91, 672, 944, 698], \"label\": \"Text\"}, {\"bbox_2d\": [91, 710, 760, 721], \"label\": \"Text\"}, {\"bbox_2d\": [90, 733, 947, 865], \"label\": \"Table\"}, {\"bbox_2d\": [91, 894, 944, 934], \"label\": \"Text\"}, {\"bbox_2d\": [510, 988, 527, 1005], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 80, 772, 91], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [533, 38, 545, 48], \"label\": \"Page-header\"}, {\"bbox_2d\": [116, 79, 960, 133], \"label\": \"Text\"}, {\"bbox_2d\": [116, 137, 960, 232], \"label\": \"Text\"}, {\"bbox_2d\": [116, 236, 960, 272], \"label\": \"Text\"}, {\"bbox_2d\": [116, 276, 960, 312], \"label\": \"Text\"}, {\"bbox_2d\": [116, 316, 960, 349], \"label\": \"Text\"}, {\"bbox_2d\": [116, 353, 960, 428], \"label\": \"Text\"}, {\"bbox_2d\": [116, 432, 960, 625], \"label\": \"Text\"}, {\"bbox_2d\": [116, 629, 960, 684], \"label\": \"Text\"}, {\"bbox_2d\": [116, 748, 960, 784], \"label\": \"Text\"}, {\"bbox_2d\": [116, 788, 960, 884], \"label\": \"Text\"}, {\"bbox_2d\": [116, 888, 960, 924], \"label\": \"Text\"}, {\"bbox_2d\": [442, 708, 680, 725], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [116, 76, 959, 133], \"label\": \"Text\"}, {\"bbox_2d\": [117, 137, 960, 231], \"label\": \"Text\"}, {\"bbox_2d\": [116, 235, 959, 270], \"label\": \"Text\"}, {\"bbox_2d\": [117, 274, 960, 310], \"label\": \"Text\"}, {\"bbox_2d\": [117, 314, 959, 350], \"label\": \"Text\"}, {\"bbox_2d\": [117, 353, 960, 428], \"label\": \"Text\"}, {\"bbox_2d\": [116, 431, 959, 625], \"label\": \"Text\"}, {\"bbox_2d\": [117, 628, 959, 683], \"label\": \"Text\"}, {\"bbox_2d\": [441, 708, 679, 724], \"label\": \"Section-header\"}, {\"bbox_2d\": [117, 746, 959, 782], \"label\": \"Text\"}, {\"bbox_2d\": [116, 785, 959, 880], \"label\": \"Text\"}, {\"bbox_2d\": [116, 883, 959, 921], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [106, 121, 916, 959], \"label\": \"Picture\"}, {\"bbox_2d\": [699, 69, 936, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [497, 997, 513, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [315, 948, 712, 963], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [700, 66, 948, 97], \"label\": \"Page-header\"}, {\"bbox_2d\": [105, 119, 918, 928], \"label\": \"Picture\"}, {\"bbox_2d\": [316, 943, 717, 959], \"label\": \"Caption\"}, {\"bbox_2d\": [499, 993, 532, 1008], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [226, 167, 808, 442], \"label\": \"Text\"}, {\"bbox_2d\": [226, 452, 808, 753], \"label\": \"Text\"}, {\"bbox_2d\": [508, 909, 525, 922], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [226, 167, 808, 439], \"label\": \"Text\"}, {\"bbox_2d\": [226, 454, 808, 752], \"label\": \"Text\"}, {\"bbox_2d\": [508, 910, 525, 921], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [92, 95, 193, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 111, 509, 150], \"label\": \"Text\"}, {\"bbox_2d\": [92, 158, 509, 283], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 286, 509, 325], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 328, 509, 367], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 372, 509, 442], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 445, 509, 528], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 514, 305, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 513, 306, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 514, 306, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 514, 306, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 514, 306, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 514, 306, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 514, 306, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 514, 306, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 514, 306, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 867, 693, 889], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 829, 655, 840], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 849, 282, 865], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 849, 282, 865], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 849, 282, 865], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 843, 945, 867], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 843, 945, 867], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [92, 849, 282, 865], \"label\": \"Text\"}, {\"bbox_2d\": [92, 849, 282, 865], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 945, 962], \"label", "label": "```json[{\"bbox_2d\": [91, 95, 192, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 111, 507, 149], \"label\": \"Text\"}, {\"bbox_2d\": [91, 158, 507, 283], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 287, 507, 326], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 330, 507, 369], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 373, 507, 441], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 445, 507, 527], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 542, 305, 553], \"label\": \"Section-header\"}, {\"bbox_2d\": [89, 560, 512, 814], \"label\": \"Table\"}, {\"bbox_2d\": [528, 95, 766, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 118, 949, 352], \"label\": \"Table\"}, {\"bbox_2d\": [528, 358, 862, 367], \"label\": \"Text\"}, {\"bbox_2d\": [528, 370, 944, 390], \"label\": \"Text\"}, {\"bbox_2d\": [1002, 94, 1013, 309], \"label\": \"Page-header\"}, {\"bbox_2d\": [544, 412, 822, 429], \"label\": \"Caption\"}, {\"bbox_2d\": [107, 849, 282, 866], \"label\": \"Caption\"}, {\"bbox_2d\": [544, 566, 758, 583], \"label\": \"Caption\"}, {\"bbox_2d\": [544, 700, 767, 717], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 822, 409, 830], \"label\": \"Text\"}, {\"bbox_2d\": [105, 876, 470, 949], \"label\": \"Picture\"}, {\"bbox_2d\": [542, 727, 905, 800], \"label\": \"Picture\"}, {\"bbox_2d\": [543, 594, 907, 665], \"label\": \"Picture\"}, {\"bbox_2d\": [540, 440, 920, 533], \"label\": \"Picture\"}, {\"bbox_2d\": [528, 829, 655, 839], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 844, 944, 868], \"label\": \"Text\"}, {\"bbox_2d\": [528, 894, 944, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 880, 693, 890], \"label\": \"Section-header\"}, {\"bbox_2d\": [591, 1000, 945, 1008], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [110, 70, 203, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [684, 69, 931, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [217, 156, 832, 422], \"label\": \"Picture\"}, {\"bbox_2d\": [525, 465, 581, 482], \"label\": \"Caption\"}, {\"bbox_2d\": [155, 547, 882, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [524, 900, 580, 917], \"label\": \"Caption\"}, {\"bbox_2d\": [511, 981, 529, 989], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [110, 69, 203, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [685, 69, 931, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [512, 981, 529, 989], \"label\": \"Page-footer\"}, {\"bbox_2d\": [215, 155, 831, 423], \"label\": \"Picture\"}, {\"bbox_2d\": [150, 547, 883, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [527, 465, 580, 483], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 71, 506, 161], \"label\": \"Text\"}, {\"bbox_2d\": [91, 278, 506, 350], \"label\": \"Text\"}, {\"bbox_2d\": [91, 381, 506, 450], \"label\": \"Text\"}, {\"bbox_2d\": [91, 451, 506, 558], \"label\": \"Text\"}, {\"bbox_2d\": [91, 560, 506, 640], \"label\": \"Text\"}, {\"bbox_2d\": [91, 642, 506, 717], \"label\": \"Text\"}, {\"bbox_2d\": [91, 719, 506, 747], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 129], \"label\": \"Text\"}, {\"bbox_2d\": [536, 131, 951, 190], \"label\": \"Text\"}, {\"bbox_2d\": [91, 949, 506, 961], \"label\": \"Text\"}, {\"bbox_2d\": [536, 949, 951, 961], \"label\": \"Text\"}, {\"bbox_2d\": [210, 171, 506, 223], \"label\": \"Formula\"}, {\"bbox_2d\": [255, 237, 506, 289], \"label\": \"Formula\"}, {\"bbox_2d\": [654, 175, 951, 210], \"label\": \"Formula\"}, {\"bbox_2d\": [210, 333, 506, 368], \"label\": \"Formula\"}, {\"bbox_2d\": [547, 382, 951, 417], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 481, 956, 575], \"label\": \"Formula\"}, {\"bbox_2d\": [568, 509, 951, 575], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 590, 951, 647], \"label\": \"Text\"}, {\"bbox_2d\": [554, 650, 951, 690], \"label\": \"Text\"}, {\"bbox_2d\": [427, 747, 951, 768], \"label\": \"Formula\"}, {\"bbox_2d\": [357, 798, 574, 849], \"label\": \"Formula\"}, {\"bbox_2d\": [260, 862, 951, 928], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 312, 951, 331], \"label\": \"Text\"}, {\"bbox_2d\": [536, 224, 951, 306], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 71, 506, 159], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 129], \"label\": \"Text\"}, {\"bbox_2d\": [536, 131, 951, 159], \"label\": \"Text\"}, {\"bbox_2d\": [91, 277, 506, 320], \"label\": \"Text\"}, {\"bbox_2d\": [91, 379, 506, 452], \"label\": \"Text\"}, {\"bbox_2d\": [91, 454, 506, 557], \"label\": \"Text\"}, {\"bbox_2d\": [91, 559, 506, 662], \"label\": \"Text\"}, {\"bbox_2d\": [91, 664, 506, 692], \"label\": \"Text\"}, {\"bbox_2d\": [536, 649, 951, 692], \"label\": \"Text\"}, {\"bbox_2d\": [536, 585, 951, 646], \"label\": \"Text\"}, {\"bbox_2d\": [536, 397, 951, 445], \"label\": \"Text\"}, {\"bbox_2d\": [536, 222, 951, 310], \"label\": \"Text\"}, {\"bbox_2d\": [536, 312, 951, 340], \"label\": \"Text\"}, {\"bbox_2d\": [172, 170, 506, 263], \"label\": \"Formula\"}, {\"bbox_2d\": [211, 330, 506, 364], \"label\": \"Formula\"}, {\"bbox_2d\": [652, 170, 951, 207], \"label\": \"Formula\"}, {\"bbox_2d\": [640, 354, 951, 385], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 459, 958, 590], \"label\": \"Formula\"}, {\"bbox_2d\": [429, 748, 951, 763], \"label\": \"Formula\"}, {\"bbox_2d\": [356, 792, 686, 853], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 856, 951, 928], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 838, 466, 851], \"label\": \"Text\"}, {\"bbox_2d\": [91, 947, 506, 960], \"label\": \"Text\"}, {\"bbox_2d\": [536, 947, 951, 960], \"label\": \"Text\"}, {\"bbox_2d\": [91, 772, 139, 785], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [835, 47, 971, 123], \"label\": \"Picture\"}, {\"bbox_2d\": [112, 973, 923, 1011], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [111, 970, 924, 1011], \"label\": \"Title\"}, {\"bbox_2d\": [838, 110, 970, 123], \"label\": \"Text\"}, {\"bbox_2d\": [836, 52, 976, 106], \"label\": \"Picture\"}, {\"bbox_2d\": [-1, 194, 1030, 948], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 93, 510, 163], \"label\": \"Text\"}, {\"bbox_2d\": [528, 93, 947, 149], \"label\": \"Text\"}, {\"bbox_2d\": [91, 195, 511, 249], \"label\": \"Text\"}, {\"bbox_2d\": [91, 279, 509, 347], \"label\": \"Text\"}, {\"bbox_2d\": [91, 358, 511, 470], \"label\": \"Text\"}, {\"bbox_2d\": [528, 203, 947, 228], \"label\": \"Text\"}, {\"bbox_2d\": [528, 441, 948, 525], \"label\": \"Text\"}, {\"bbox_2d\": [91, 499, 510, 598], \"label\": \"Text\"}, {\"bbox_2d\": [528, 554, 948, 595], \"label\": \"Text\"}, {\"bbox_2d\": [528, 606, 947, 674], \"label\": \"Text\"}, {\"bbox_2d\": [528, 685, 947, 741], \"label\": \"Text\"}, {\"bbox_2d\": [91, 629, 509, 698], \"label\": \"Text\"}, {\"bbox_2d\": [91, 728, 509, 826], \"label\": \"Text\"}, {\"bbox_2d\": [528, 750, 947, 877], \"label\": \"Text\"}, {\"bbox_2d\": [91, 872, 510, 955], \"label\": \"Text\"}, {\"bbox_2d\": [91, 843, 444, 870], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 615, 217, 627], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 488, 291, 499], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 541, 715, 552], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 715, 426, 727], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 166, 648, 178], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 179, 329, 190], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 265, 274, 276], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 233, 947, 287], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 292, 947, 317], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 319, 947, 343], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 346, 947, 388], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 391, 947, 432], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 888, 947, 989], \"label\": \"Page-footer\"}, {\"bbox_2d\": [590, 999, 947, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [1004, 93, 1013, 309], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 95, 507, 163], \"label\": \"Text\"}, {\"bbox_2d\": [91, 194, 507, 248], \"label\": \"Text\"}, {\"bbox_2d\": [91, 279, 507, 347], \"label\": \"Text\"}, {\"bbox_2d\": [91, 359, 507, 470], \"label\": \"Text\"}, {\"bbox_2d\": [91, 502, 507, 598], \"label\": \"Text\"}, {\"bbox_2d\": [91, 630, 507, 698], \"label\": \"Text\"}, {\"bbox_2d\": [91, 729, 507, 826], \"label\": \"Text\"}, {\"bbox_2d\": [91, 872, 507, 954], \"label\": \"Text\"}, {\"bbox_2d\": [528, 95, 944, 148], \"label\": \"Text\"}, {\"bbox_2d\": [528, 442, 944, 524], \"label\": \"Text\"}, {\"bbox_2d\": [528, 555, 944, 594], \"label\": \"Text\"}, {\"bbox_2d\": [528, 606, 944, 674], \"label\": \"Text\"}, {\"bbox_2d\": [528, 686, 944, 739], \"label\": \"Text\"}, {\"bbox_2d\": [528, 752, 944, 877], \"label\": \"Text\"}, {\"bbox_2d\": [591, 1000, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 843, 443, 868], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 715, 427, 725], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 616, 216, 626], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 487, 291, 497], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 265, 274, 275], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 180, 328, 190], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 166, 648, 177], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 232, 944, 286], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 290, 944, 314], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 319, 944, 343], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 347, 944, 386], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 391, 944, 429], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 541, 715, 551], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 189, 922, 199], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 204, 944, 228], \"label\": \"Text\"}, {\"bbox_2d\": [1002, 94, 1013, 309], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [541, 422, 937, 477], \"label\": \"Text\"}, {\"bbox_2d\": [541, 484, 938, 728], \"label\": \"Text\"}, {\"bbox_2d\": [541, 735, 937, 936], \"label\": \"Text\"}, {\"bbox_2d\": [541, 943, 937, 978], \"label\": \"Text\"}, {\"bbox_2d\": [62, 422, 460, 498], \"label\": \"Text\"}, {\"bbox_2d\": [62, 505, 460, 623], \"label\": \"Text\"}, {\"bbox_2d\": [62, 630, 460, 810], \"label\": \"Text\"}, {\"bbox_2d\": [62, 818, 460, 978], \"label\": \"Text\"}, {\"bbox_2d\": [62, 997, 78, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [61, 48, 1035, 401], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [59, 45, 1035, 402], \"label\": \"Picture\"}, {\"bbox_2d\": [62, 422, 459, 496], \"label\": \"Text\"}, {\"bbox_2d\": [62, 506, 459, 621], \"label\": \"Text\"}, {\"bbox_2d\": [62, 631, 459, 810], \"label\": \"Text\"}, {\"bbox_2d\": [62, 820, 459, 977], \"label\": \"Text\"}, {\"bbox_2d\": [540, 422, 937, 475], \"label\": \"Text\"}, {\"bbox_2d\": [540, 485, 937, 726], \"label\": \"Text\"}, {\"bbox_2d\": [540, 736, 937, 935], \"label\": \"Text\"}, {\"bbox_2d\": [540, 945, 937, 977], \"label\": \"Text\"}, {\"bbox_2d\": [62, 998, 77, 1007], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [947, 86, 977, 252], \"label\": \"Page-header\"}, {\"bbox_2d\": [154, 304, 446, 726], \"label\": \"Picture\"}, {\"bbox_2d\": [546, 287, 864, 727], \"label\": \"Picture\"}, {\"bbox_2d\": [233, 744, 772, 760], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [233, 739, 774, 760], \"label\": \"Caption\"}, {\"bbox_2d\": [148, 269, 873, 735], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [681, 45, 937, 60], \"label\": \"Page-header\"}, {\"bbox_2d\": [370, 79, 665, 95], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 103, 806, 115], \"label\": \"Text\"}, {\"bbox_2d\": [231, 138, 800, 171], \"label\": \"Text\"}, {\"bbox_2d\": [98, 196, 346, 209], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 231, 901, 263], \"label\": \"Text\"}, {\"bbox_2d\": [288, 496, 771, 509], \"label\": \"Text\"}, {\"bbox_2d\": [221, 406, 342, 418], \"label\": \"Text\"}, {\"bbox_2d\": [290, 287, 488, 320], \"label\": \"Text\"}, {\"bbox_2d\": [549, 287, 847, 319], \"label\": \"Text\"}, {\"bbox_2d\": [290, 340, 565, 446], \"label\": \"Picture\"}, {\"bbox_2d\": [738, 336, 863, 459], \"label\": \"Picture\"}, {\"bbox_2d\": [88, 280, 367, 476], \"label\": \"Picture\"}, {\"bbox_2d\": [132, 347, 317, 384], \"label\": \"Picture\"}, {\"bbox_2d\": [132, 559, 322, 726], \"label\": \"Picture\"}, {\"bbox_2d\": [401, 561, 597, 727], \"label\": \"Picture\"}, {\"bbox_2d\": [703, 561, 878, 726], \"label\": \"Picture\"}, {\"bbox_2d\": [648, 524, 829, 539], \"label\": \"Caption\"}, {\"bbox_2d\": [478, 524, 513, 539], \"label\": \"Caption\"}, {\"bbox_2d\": [161, 524, 297, 539], \"label\": \"Caption\"}, {\"bbox_2d\": [161, 536, 297, 540], \"label\": \"Caption\"}, {\"bbox_2d\": [161, 549, 297, 543], \"label\": \"Caption\"}, {\"bbox_2d\": [161, 551, 297, 545], \"label\": \"Caption\"}, {\"bbox_2d\": [931, 957, 953, 972], \"label\": \"Page-footer\"}, {\"bbox_2d\": [86, 977, 182, 992], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [679, 40, 942, 60], \"label\": \"Page-header\"}, {\"bbox_2d\": [234, 134, 801, 173], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 190, 349, 211], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 226, 905, 264], \"label\": \"Text\"}, {\"bbox_2d\": [913, 955, 959, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [84, 974, 187, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [369, 73, 673, 100], \"label\": \"Section-header\"}, {\"bbox_2d\": [229, 97, 811, 117], \"label\": \"Text\"}, {\"bbox_2d\": [87, 280, 975, 776], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [226, 167, 567, 181], \"label\": \"Text\"}, {\"bbox_2d\": [226, 192, 808, 544], \"label\": \"Text\"}, {\"bbox_2d\": [226, 648, 808, 870], \"label\": \"Text\"}, {\"bbox_2d\": [226, 581, 300, 593], \"label\": \"Section-header\"}, {\"bbox_2d\": [226, 620, 468, 631], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 908, 521, 922], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [226, 167, 566, 178], \"label\": \"Text\"}, {\"bbox_2d\": [226, 193, 808, 543], \"label\": \"Text\"}, {\"bbox_2d\": [226, 649, 808, 869], \"label\": \"Text\"}, {\"bbox_2d\": [513, 910, 521, 921], \"label\": \"Page-footer\"}, {\"bbox_2d\": [226, 619, 467, 631], \"label\": \"Section-header\"}, {\"bbox_2d\": [226, 581, 300, 593], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 94, 911, 122], \"label\": \"Text\"}, {\"bbox_2d\": [125, 127, 911, 155], \"label\": \"Text\"}, {\"bbox_2d\": [125, 160, 911, 198], \"label\": \"Text\"}, {\"bbox_2d\": [125, 203, 911, 248], \"label\": \"Text\"}, {\"bbox_2d\": [125, 254, 911, 282], \"label\": \"Text\"}, {\"bbox_2d\": [125, 288, 911, 326], \"label\": \"Text\"}, {\"bbox_2d\": [125, 331, 911, 354], \"label\": \"Text\"}, {\"bbox_2d\": [125, 359, 911, 396], \"label\": \"Text\"}, {\"bbox_2d\": [125, 402, 911, 425], \"label\": \"Text\"}, {\"bbox_2d\": [125, 430, 911, 463], \"label\": \"Text\"}, {\"bbox_2d\": [125, 469, 911, 506], \"label\": \"Text\"}, {\"bbox_2d\": [125, 511, 911, 548], \"label\": \"Text\"}, {\"bbox_2d\": [125, 554, 911, 586], \"label\": \"Text\"}, {\"bbox_2d\": [125, 592, 911, 619], \"label\": \"Text\"}, {\"bbox_2d\": [125, 625, 911, 652], \"label\": \"Text\"}, {\"bbox_2d\": [125, 658, 911, 695], \"label\": \"Text\"}, {\"bbox_2d\": [125, 701, 911, 728], \"label\": \"Text\"}, {\"bbox_2d\": [125, 734, 911, 761], \"label\": \"Text\"}, {\"bbox_2d\": [125, 767, 911, 811], \"label\": \"Text\"}, {\"bbox_2d\": [125, 817, 911, 854], \"label\": \"Text\"}, {\"bbox_2d\": [125, 860, 911, 887], \"label\": \"Text\"}, {\"bbox_2d\": [125, 893, 911, 910], \"label\": \"Text\"}, {\"bbox_2d\": [125, 916, 911, 933], \"label\": \"Text\"}, {\"bbox_2d\": [508, 972, 527, 987], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [125, 92, 910, 123], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 128, 911, 158], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 164, 911, 194], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 217, 911, 247], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 199, 911, 212], \"label\": \"Text\"}, {\"bbox_2d\": [125, 253, 911, 283], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 289, 910, 319], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 324, 911, 355], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 360, 911, 390], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 396, 911, 426], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 431, 911, 462], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 467, 911, 497], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 503, 911, 533], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 556, 911, 586], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 592, 911, 622], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 539, 779, 551], \"label\": \"Text\"}, {\"bbox_2d\": [125, 628, 911, 658], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 663, 911, 694], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 699, 911, 729], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 753, 911, 783], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 788, 911, 818], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 824, 911, 854], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 860, 911, 890], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 735, 834, 747], \"label\": \"Text\"}, {\"bbox_2d\": [125, 895, 857, 908], \"label\": \"Text\"}, {\"bbox_2d\": [125, 913, 911, 925], \"label\": \"Text\"}, {\"bbox_2d\": [508, 974, 527, 987], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [727, 62, 934, 94], \"label\": \"Page-header\"}, {\"bbox_2d\": [119, 113, 853, 978], \"label\": \"Picture\"}, {\"bbox_2d\": [317, 982, 698, 1001], \"label\": \"Caption\"}, {\"bbox_2d\": [486, 1007, 501, 1021], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [110, 109, 915, 975], \"label\": \"Picture\"}, {\"bbox_2d\": [318, 980, 691, 996], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 1002, 524, 1017], \"label\": \"Page-footer\"}, {\"bbox_2d\": [726, 64, 949, 94], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [31, 288, 62, 816], \"label\": \"Page-header\"}, {\"bbox_2d\": [232, 66, 839, 79], \"label\": \"Title\"}, {\"bbox_2d\": [315, 97, 710, 120], \"label\": \"Text\"}, {\"bbox_2d\": [491, 139, 583, 150], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 155, 933, 190], \"label\": \"Text\"}, {\"bbox_2d\": [225, 403, 382, 412], \"label\": \"Section-header\"}, {\"bbox_2d\": [108, 434, 525, 670], \"label\": \"Text\"}, {\"bbox_2d\": [91, 673, 525, 784], \"label\": \"Text\"}, {\"bbox_2d\": [91, 785, 525, 842], \"label\": \"Text\"}, {\"bbox_2d\": [553, 403, 978, 484], \"label\": \"Text\"}, {\"bbox_2d\": [553, 514, 978, 625], \"label\": \"Text\"}, {\"bbox_2d\": [553, 655, 978, 892], \"label\": \"Text\"}, {\"bbox_2d\": [108, 880, 296, 889], \"label\": \"Text\"}, {\"bbox_2d\": [188, 354, 556, 374], \"label\": \"Text\"}, {\"bbox_2d\": [200, 200, 978, 338], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [233, 66, 839, 79], \"label\": \"Title\"}, {\"bbox_2d\": [312, 95, 759, 108], \"label\": \"Text\"}, {\"bbox_2d\": [250, 111, 822, 120], \"label\": \"Text\"}, {\"bbox_2d\": [488, 139, 583, 150], \"label\": \"Text\"}, {\"bbox_2d\": [136, 153, 935, 175], \"label\": \"Text\"}, {\"bbox_2d\": [465, 179, 606, 189], \"label\": \"Text\"}, {\"bbox_2d\": [189, 200, 882, 338], \"label\": \"Text\"}, {\"bbox_2d\": [189, 354, 778, 375], \"label\": \"Text\"}, {\"bbox_2d\": [93, 434, 520, 671], \"label\": \"Text\"}, {\"bbox_2d\": [93, 674, 520, 783], \"label\": \"Text\"}, {\"bbox_2d\": [93, 787, 520, 840], \"label\": \"Text\"}, {\"bbox_2d\": [98, 877, 292, 888], \"label\": \"Footnote\"}, {\"bbox_2d\": [551, 655, 978, 892], \"label\": \"Text\"}, {\"bbox_2d\": [551, 515, 978, 624], \"label\": \"Text\"}, {\"bbox_2d\": [551, 403, 978, 484], \"label\": \"Text\"}, {\"bbox_2d\": [232, 403, 382, 413], \"label\": \"Section-header\"}, {\"bbox_2d\": [31, 290, 63, 818], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 106, 949, 292], \"label\": \"Table\"}, {\"bbox_2d\": [121, 295, 470, 304], \"label\": \"Text\"}, {\"bbox_2d\": [121, 320, 740, 332], \"label\": \"Text\"}, {\"bbox_2d\": [121, 334, 901, 372], \"label\": \"Text\"}, {\"bbox_2d\": [121, 375, 902, 461], \"label\": \"Text\"}, {\"bbox_2d\": [121, 492, 487, 504], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 520, 475, 532], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 644, 572, 656], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 540, 899, 631], \"label\": \"Table\"}, {\"bbox_2d\": [121, 666, 890, 770], \"label\": \"Table\"}, {\"bbox_2d\": [121, 787, 867, 814], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 816, 841, 827], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 940, 536, 954], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [118, 106, 947, 294], \"label\": \"Table\"}, {\"bbox_2d\": [121, 292, 473, 305], \"label\": \"Text\"}, {\"bbox_2d\": [129, 317, 742, 332], \"label\": \"Text\"}, {\"bbox_2d\": [129, 332, 904, 376], \"label\": \"Text\"}, {\"bbox_2d\": [129, 375, 903, 462], \"label\": \"Text\"}, {\"bbox_2d\": [119, 537, 886, 633], \"label\": \"Table\"}, {\"bbox_2d\": [120, 663, 884, 772], \"label\": \"Table\"}, {\"bbox_2d\": [129, 784, 870, 813], \"label\": \"Text\"}, {\"bbox_2d\": [129, 813, 845, 828], \"label\": \"Text\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 643, 575, 658], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 519, 478, 534], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 490, 491, 505], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [92, 93, 303, 104], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 438, 253, 449], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 108, 510, 177], \"label\": \"Text\"}, {\"bbox_2d\": [92, 181, 510, 349], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 353, 510, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 145, 947, 213], \"label\": \"Text\"}, {\"bbox_2d\": [526, 239, 947, 322], \"label\": \"Text\"}, {\"bbox_2d\": [526, 348, 947, 402], \"label\": \"Text\"}, {\"bbox_2d\": [92, 452, 720, 463], \"label\": \"Text\"}, {\"bbox_2d\": [526, 609, 947, 650], \"label\": \"Text\"}, {\"bbox_2d\": [92, 609, 510, 637], \"label\": \"Text\"}, {\"bbox_2d\": [526, 680, 947, 721], \"label\": \"Text\"}, {\"bbox_2d\": [526, 733, 947, 816], \"label\": \"Text\"}, {\"bbox_2d\": [92, 647, 510, 773], \"label\": \"Text\"}, {\"bbox_2d\": [92, 785, 510, 809], \"label\": \"Text\"}, {\"bbox_2d\": [92, 813, 510, 854], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 858, 510, 909], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 922, 510, 961], \"label\": \"Text\"}, {\"bbox_2d\": [526, 826, 947, 854], \"label\": \"Text\"}, {\"bbox_2d\": [526, 882, 947, 952], \"label\": \"Text\"}, {\"bbox_2d\": [526, 666, 660, 677], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 333, 799, 344], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 225, 786, 236], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 131, 592, 141], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 95, 947, 118], \"label\": \"Text\"}, {\"bbox_2d\": [591, 998, 947, 1006], \"label\": \"Page-footer\"}, {\"bbox_2d\": [526, 475, 786, 485], \"label\": \"Section-header\"}, {\"bbox_2d\": [623, 497, 916, 528], \"label\": \"Text\"}, {\"bbox_2d\": [623, 536, 819, 546], \"label\": \"Text\"}, {\"bbox_2d\": [623, 567, 777, 576], \"label\": \"Text\"}, {\"bbox_2d\": [92, 536, 223, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 567, 193, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [623, 868, 607, 878], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 536, 562, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 536, 223, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 536, 223, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 536, 223, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 536, 223, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 536, 223, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 536, 223, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 567, 562, 576], \"label\": \"Section-header\"}, {\"bbox", "label": "```json[{\"bbox_2d\": [1002, 94, 1013, 343], \"label\": \"Page-header\"}, {\"bbox_2d\": [543, 95, 944, 119], \"label\": \"Text\"}, {\"bbox_2d\": [528, 146, 944, 214], \"label\": \"Text\"}, {\"bbox_2d\": [91, 109, 507, 177], \"label\": \"Text\"}, {\"bbox_2d\": [528, 132, 593, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 95, 302, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 181, 507, 292], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 296, 507, 350], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 354, 507, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 226, 786, 236], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 240, 944, 322], \"label\": \"Text\"}, {\"bbox_2d\": [528, 349, 944, 402], \"label\": \"Text\"}, {\"bbox_2d\": [528, 334, 800, 344], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 438, 253, 448], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 452, 720, 463], \"label\": \"Text\"}, {\"bbox_2d\": [77, 468, 962, 597], \"label\": \"Table\"}, {\"bbox_2d\": [528, 611, 944, 650], \"label\": \"Text\"}, {\"bbox_2d\": [91, 611, 507, 636], \"label\": \"Text\"}, {\"bbox_2d\": [91, 648, 507, 773], \"label\": \"Text\"}, {\"bbox_2d\": [528, 682, 944, 721], \"label\": \"Text\"}, {\"bbox_2d\": [528, 733, 944, 815], \"label\": \"Text\"}, {\"bbox_2d\": [91, 785, 507, 810], \"label\": \"Text\"}, {\"bbox_2d\": [528, 827, 944, 851], \"label\": \"Text\"}, {\"bbox_2d\": [528, 883, 944, 951], \"label\": \"Text\"}, {\"bbox_2d\": [91, 900, 507, 961], \"label\": \"Text\"}, {\"bbox_2d\": [91, 857, 507, 896], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 814, 507, 853], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 667, 660, 677], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 869, 607, 879], \"label\": \"Section-header\"}, {\"bbox_2d\": [591, 1000, 945, 1008], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [281, 870, 898, 938], \"label\": \"Section-header\"}, {\"bbox_2d\": [1005, 1009, 1018, 1021], \"label\": \"Page-footer\"}, {\"bbox_2d\": [103, 104, 763, 195], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 214, 214, 263], \"label\": \"Text\"}, {\"bbox_2d\": [79, 267, 188, 360], \"label\": \"Text\"}, {\"bbox_2d\": [79, 365, 185, 379], \"label\": \"Text\"}, {\"bbox_2d\": [79, 383, 296, 446], \"label\": \"Text\"}, {\"bbox_2d\": [79, 450, 218, 469], \"label\": \"Text\"}, {\"bbox_2d\": [385, 214, 578, 294], \"label\": \"Text\"}, {\"bbox_2d\": [385, 301, 520, 351], \"label\": \"Text\"}, {\"bbox_2d\": [385, 354, 646, 380], \"label\": \"Text\"}, {\"bbox_2d\": [385, 383, 578, 408], \"label\": \"Text\"}, {\"bbox_2d\": [385, 412, 502, 436], \"label\": \"Text\"}, {\"bbox_2d\": [385, 440, 567, 466], \"label\": \"Text\"}, {\"bbox_2d\": [79, 698, 877, 850], \"label\": \"Table\"}, {\"bbox_2d\": [132, 698, 258, 845], \"label\": \"Picture\"}, {\"bbox_2d\": [432, 909, 956, 1016], \"label\": \"Picture\"}, {\"bbox_2d\": [696, 214, 875, 294], \"label\": \"Text\"}, {\"bbox_2d\": [696, 297, 965, 360], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [77, 97, 761, 201], \"label\": \"Section-header\"}, {\"bbox_2d\": [388, 301, 452, 315], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 365, 183, 379], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 214, 159, 228], \"label\": \"Section-header\"}, {\"bbox_2d\": [697, 216, 768, 241], \"label\": \"Text\"}, {\"bbox_2d\": [697, 244, 876, 269], \"label\": \"Text\"}, {\"bbox_2d\": [697, 272, 828, 297], \"label\": \"Text\"}, {\"bbox_2d\": [697, 300, 964, 337], \"label\": \"Text\"}, {\"bbox_2d\": [697, 339, 795, 365], \"label\": \"Text\"}, {\"bbox_2d\": [388, 317, 607, 354], \"label\": \"Text\"}, {\"bbox_2d\": [388, 357, 643, 382], \"label\": \"Text\"}, {\"bbox_2d\": [388, 385, 576, 410], \"label\": \"Text\"}, {\"bbox_2d\": [388, 413, 503, 438], \"label\": \"Text\"}, {\"bbox_2d\": [388, 441, 567, 466], \"label\": \"Text\"}, {\"bbox_2d\": [79, 421, 214, 446], \"label\": \"Text\"}, {\"bbox_2d\": [79, 381, 298, 418], \"label\": \"Text\"}, {\"bbox_2d\": [79, 335, 212, 360], \"label\": \"Text\"}, {\"bbox_2d\": [79, 307, 212, 332], \"label\": \"Text\"}, {\"bbox_2d\": [79, 279, 214, 304], \"label\": \"Text\"}, {\"bbox_2d\": [79, 227, 214, 264], \"label\": \"Text\"}, {\"bbox_2d\": [388, 216, 577, 241], \"label\": \"Text\"}, {\"bbox_2d\": [388, 244, 505, 269], \"label\": \"Text\"}, {\"bbox_2d\": [388, 272, 505, 297], \"label\": \"Text\"}, {\"bbox_2d\": [79, 266, 146, 280], \"label\": \"Section-header\"}, {\"bbox_2d\": [65, 808, 260, 842], \"label\": \"Caption\"}, {\"bbox_2d\": [281, 850, 902, 948], \"label\": \"Text\"}, {\"bbox_2d\": [1006, 1013, 1021, 1023], \"label\": \"Page-footer\"}, {\"bbox_2d\": [279, 698, 875, 850], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 603], \"label\": \"Text\"}, {\"bbox_2d\": [121, 616, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 922, 603], \"label\": \"Text\"}, {\"bbox_2d\": [121, 616, 922, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 989, 511, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [62, 44, 433, 62], \"label\": \"Section-header\"}, {\"bbox_2d\": [61, 81, 412, 97], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 472, 557, 489], \"label\": \"Section-header\"}, {\"bbox_2d\": [61, 498, 404, 513], \"label\": \"Section-header\"}, {\"bbox_2d\": [61, 712, 407, 726], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 1001, 527, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [27, 407, 39, 639], \"label\": \"Page-header\"}, {\"bbox_2d\": [62, 106, 326, 145], \"label\": \"Text\"}, {\"bbox_2d\": [189, 233, 322, 244], \"label\": \"Text\"}, {\"bbox_2d\": [189, 318, 340, 328], \"label\": \"Text\"}, {\"bbox_2d\": [189, 388, 300, 413], \"label\": \"Text\"}, {\"bbox_2d\": [189, 432, 528, 456], \"label\": \"Text\"}, {\"bbox_2d\": [62, 528, 280, 540], \"label\": \"Text\"}, {\"bbox_2d\": [62, 559, 281, 583], \"label\": \"Text\"}, {\"bbox_2d\": [62, 599, 242, 611], \"label\": \"Text\"}, {\"bbox_2d\": [62, 783, 230, 795], \"label\": \"Text\"}, {\"bbox_2d\": [62, 743, 261, 768], \"label\": \"Text\"}, {\"bbox_2d\": [288, 530, 363, 643], \"label\": \"Picture\"}, {\"bbox_2d\": [347, 656, 497, 695], \"label\": \"Text\"}, {\"bbox_2d\": [303, 884, 441, 909], \"label\": \"Text\"}, {\"bbox_2d\": [303, 926, 463, 981], \"label\": \"Text\"}, {\"bbox_2d\": [560, 886, 741, 925], \"label\": \"Text\"}, {\"bbox_2d\": [527, 740, 738, 752], \"label\": \"Section-header\"}, {\"bbox_2d\": [794, 739, 916, 751], \"label\": \"Section-header\"}, {\"bbox_2d\": [300, 743, 451, 753], \"label\": \"Section-header\"}, {\"bbox_2d\": [300, 763, 438, 868], \"label\": \"Picture\"}, {\"bbox_2d\": [560, 758, 694, 868], \"label\": \"Picture\"}, {\"bbox_2d\": [778, 758, 910, 867], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [60, 42, 431, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [61, 467, 557, 490], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 103, 158, 133], \"label\": \"Text\"}, {\"bbox_2d\": [190, 102, 279, 119], \"label\": \"Text\"}, {\"bbox_2d\": [190, 131, 326, 147], \"label\": \"Text\"}, {\"bbox_2d\": [190, 230, 321, 246], \"label\": \"Text\"}, {\"bbox_2d\": [190, 314, 340, 330], \"label\": \"Text\"}, {\"bbox_2d\": [190, 385, 300, 415], \"label\": \"Text\"}, {\"bbox_2d\": [190, 427, 533, 458], \"label\": \"Text\"}, {\"bbox_2d\": [681, 118, 870, 463], \"label\": \"Picture\"}, {\"bbox_2d\": [330, 529, 487, 644], \"label\": \"Picture\"}, {\"bbox_2d\": [300, 757, 442, 875], \"label\": \"Picture\"}, {\"bbox_2d\": [550, 755, 710, 868], \"label\": \"Picture\"}, {\"bbox_2d\": [781, 755, 913, 873], \"label\": \"Picture\"}, {\"bbox_2d\": [299, 739, 452, 756], \"label\": \"Text\"}, {\"bbox_2d\": [528, 736, 743, 753], \"label\": \"Text\"}, {\"bbox_2d\": [793, 736, 917, 753], \"label\": \"Text\"}, {\"bbox_2d\": [561, 883, 746, 928], \"label\": \"Text\"}, {\"bbox_2d\": [304, 880, 440, 911], \"label\": \"Text\"}, {\"bbox_2d\": [304, 923, 464, 981], \"label\": \"Text\"}, {\"bbox_2d\": [508, 996, 527, 1017], \"label\": \"Page-footer\"}, {\"bbox_2d\": [61, 782, 232, 798], \"label\": \"Text\"}, {\"bbox_2d\": [61, 739, 260, 769], \"label\": \"Text\"}, {\"bbox_2d\": [344, 653, 498, 697], \"label\": \"Text\"}, {\"bbox_2d\": [61, 526, 275, 542], \"label\": \"Text\"}, {\"bbox_2d\": [61, 554, 285, 584], \"label\": \"Text\"}, {\"bbox_2d\": [61, 596, 241, 612], \"label\": \"Text\"}, {\"bbox_2d\": [61, 709, 407, 728], \"label\": \"Section-header\"}, {\"bbox_2d\": [61, 496, 403, 515], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 79, 412, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 407, 40, 638], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [533, 112, 944, 217], \"label\": \"Text\"}, {\"bbox_2d\": [533, 228, 953, 334], \"label\": \"Text\"}, {\"bbox_2d\": [533, 346, 922, 373], \"label\": \"Text\"}, {\"bbox_2d\": [81, 322, 497, 380], \"label\": \"Text\"}, {\"bbox_2d\": [81, 392, 488, 450], \"label\": \"Text\"}, {\"bbox_2d\": [81, 462, 497, 522], \"label\": \"Text\"}, {\"bbox_2d\": [81, 622, 491, 666], \"label\": \"Text\"}, {\"bbox_2d\": [81, 774, 497, 894], \"label\": \"Text\"}, {\"bbox_2d\": [81, 1004, 246, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [81, 112, 487, 122], \"label\": \"Text\"}, {\"bbox_2d\": [533, 589, 943, 604], \"label\": \"Text\"}, {\"bbox_2d\": [533, 546, 628, 559], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 576, 647, 588], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 612, 956, 798], \"label\": \"Table\"}, {\"bbox_2d\": [78, 647, 500, 752], \"label\": \"Table\"}, {\"bbox_2d\": [79, 535, 499, 601], \"label\": \"Table\"}, {\"bbox_2d\": [79, 236, 500, 300], \"label\": \"Table\"}, {\"bbox_2d\": [81, 201, 488, 228], \"label\": \"Text\"}, {\"bbox_2d\": [80, 129, 503, 179], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [81, 112, 495, 122], \"label\": \"Text\"}, {\"bbox_2d\": [533, 112, 945, 216], \"label\": \"Text\"}, {\"bbox_2d\": [533, 229, 951, 334], \"label\": \"Text\"}, {\"bbox_2d\": [81, 201, 488, 227], \"label\": \"Text\"}, {\"bbox_2d\": [533, 347, 927, 373], \"label\": \"Text\"}, {\"bbox_2d\": [81, 322, 499, 380], \"label\": \"Text\"}, {\"bbox_2d\": [81, 393, 491, 450], \"label\": \"Text\"}, {\"bbox_2d\": [81, 464, 497, 521], \"label\": \"Text\"}, {\"bbox_2d\": [533, 593, 944, 603], \"label\": \"Text\"}, {\"bbox_2d\": [81, 623, 492, 665], \"label\": \"Text\"}, {\"bbox_2d\": [81, 774, 501, 894], \"label\": \"Text\"}, {\"bbox_2d\": [81, 1005, 94, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [111, 1005, 246, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [533, 575, 647, 588], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 542, 629, 560], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 133, 504, 182], \"label\": \"Table\"}, {\"bbox_2d\": [78, 238, 502, 304], \"label\": \"Table\"}, {\"bbox_2d\": [78, 532, 502, 603], \"label\": \"Table\"}, {\"bbox_2d\": [78, 676, 504, 753], \"label\": \"Table\"}, {\"bbox_2d\": [531, 613, 954, 801], \"label\": \"Table\"}, {\"bbox_2d\": [533, 384, 954, 507], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 25, 352, 38], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 94, 238, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 94, 651, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 109, 511, 176], \"label\": \"Text\"}, {\"bbox_2d\": [528, 109, 946, 162], \"label\": \"Text\"}, {\"bbox_2d\": [91, 999, 444, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [528, 284, 945, 376], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 308, 729, 317], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 330, 945, 376], \"label\": \"List-item\"}, {\"bbox_2d\": [89, 186, 508, 409], \"label\": \"Table\"}, {\"bbox_2d\": [526, 173, 947, 278], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 351, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [528, 95, 650, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 95, 237, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 109, 944, 163], \"label\": \"Text\"}, {\"bbox_2d\": [91, 109, 507, 177], \"label\": \"Text\"}, {\"bbox_2d\": [528, 285, 944, 305], \"label\": \"Text\"}, {\"bbox_2d\": [528, 308, 728, 317], \"label\": \"Text\"}, {\"bbox_2d\": [528, 320, 944, 375], \"label\": \"Text\"}, {\"bbox_2d\": [91, 1000, 444, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [87, 189, 511, 408], \"label\": \"Table\"}, {\"bbox_2d\": [523, 176, 949, 278], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [443, 118, 592, 139], \"label\": \"Section-header\"}, {\"bbox_2d\": [443, 171, 595, 189], \"label\": \"Section-header\"}, {\"bbox_2d\": [345, 200, 690, 215], \"label\": \"Section-header\"}, {\"bbox_2d\": [765, 175, 880, 187], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 228, 498, 512], \"label\": \"Text\"}, {\"bbox_2d\": [156, 602, 850, 615], \"label\": \"Text\"}, {\"bbox_2d\": [156, 726, 235, 935], \"label\": \"Text\"}, {\"bbox_2d\": [443, 546, 600, 562], \"label\": \"Section-header\"}, {\"bbox_2d\": [345, 575, 690, 590], \"label\": \"Section-header\"}, {\"bbox_2d\": [765, 549, 880, 561], \"label\": \"Section-header\"}, {\"bbox_2d\": [443, 649, 600, 665], \"label\": \"Section-header\"}, {\"bbox_2d\": [345, 678, 690, 693], \"label\": \"Section-header\"}, {\"bbox_2d\": [509, 979, 526, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [442, 122, 600, 138], \"label\": \"Section-header\"}, {\"bbox_2d\": [441, 173, 603, 189], \"label\": \"Section-header\"}, {\"bbox_2d\": [345, 201, 697, 216], \"label\": \"Section-header\"}, {\"bbox_2d\": [766, 176, 886, 188], \"label\": \"Text\"}, {\"bbox_2d\": [156, 229, 228, 241], \"label\": \"Text\"}, {\"bbox_2d\": [156, 254, 210, 266], \"label\": \"Text\"}, {\"bbox_2d\": [156, 279, 281, 291], \"label\": \"Text\"}, {\"bbox_2d\": [156, 303, 259, 315], \"label\": \"Text\"}, {\"bbox_2d\": [156, 328, 227, 340], \"label\": \"Text\"}, {\"bbox_2d\": [156, 353, 214, 365], \"label\": \"Text\"}, {\"bbox_2d\": [156, 378, 201, 389], \"label\": \"Text\"}, {\"bbox_2d\": [156, 402, 259, 414], \"label\": \"Text\"}, {\"bbox_2d\": [156, 427, 480, 439], \"label\": \"Text\"}, {\"bbox_2d\": [156, 452, 501, 463], \"label\": \"Text\"}, {\"bbox_2d\": [156, 476, 235, 488], \"label\": \"Text\"}, {\"bbox_2d\": [156, 501, 505, 513], \"label\": \"Text\"}, {\"bbox_2d\": [766, 551, 886, 562], \"label\": \"Text\"}, {\"bbox_2d\": [156, 604, 857, 616], \"label\": \"Text\"}, {\"bbox_2d\": [766, 653, 886, 665], \"label\": \"Text\"}, {\"bbox_2d\": [156, 727, 214, 739], \"label\": \"Text\"}, {\"bbox_2d\": [156, 752, 234, 764], \"label\": \"Text\"}, {\"bbox_2d\": [156, 777, 246, 788], \"label\": \"Text\"}, {\"bbox_2d\": [156, 826, 233, 838], \"label\": \"Text\"}, {\"bbox_2d\": [156, 801, 214, 813], \"label\": \"Text\"}, {\"bbox_2d\": [156, 851, 205, 863], \"label\": \"Text\"}, {\"bbox_2d\": [156, 875, 221, 887], \"label\": \"Text\"}, {\"bbox_2d\": [156, 900, 247, 912], \"label\": \"Text\"}, {\"bbox_2d\": [156, 925, 200, 937], \"label\": \"Text\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [441, 649, 603, 666], \"label\": \"Section-header\"}, {\"bbox_2d\": [345, 678, 691, 693], \"label\": \"Section-header\"}, {\"bbox_2d\": [441, 547, 603, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [345, 575, 697, 591], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [138, 96, 212, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 113, 913, 137], \"label\": \"Text\"}, {\"bbox_2d\": [138, 148, 913, 173], \"label\": \"Text\"}, {\"bbox_2d\": [138, 183, 192, 193], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 201, 913, 252], \"label\": \"Text\"}, {\"bbox_2d\": [138, 265, 913, 391], \"label\": \"Text\"}, {\"bbox_2d\": [138, 402, 913, 514], \"label\": \"Text\"}, {\"bbox_2d\": [138, 525, 913, 549], \"label\": \"Text\"}, {\"bbox_2d\": [138, 807, 574, 817], \"label\": \"Text\"}, {\"bbox_2d\": [138, 837, 403, 847], \"label\": \"Text\"}, {\"bbox_2d\": [497, 940, 536, 953], \"label\": \"Page-footer\"}, {\"bbox_2d\": [138, 562, 877, 796], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [137, 93, 215, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 180, 195, 196], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 108, 917, 137], \"label\": \"Text\"}, {\"bbox_2d\": [137, 144, 918, 174], \"label\": \"Text\"}, {\"bbox_2d\": [137, 197, 918, 255], \"label\": \"Text\"}, {\"bbox_2d\": [137, 262, 920, 392], \"label\": \"Text\"}, {\"bbox_2d\": [137, 399, 918, 515], \"label\": \"Text\"}, {\"bbox_2d\": [137, 522, 918, 551], \"label\": \"Text\"}, {\"bbox_2d\": [138, 559, 878, 797], \"label\": \"Table\"}, {\"bbox_2d\": [137, 804, 578, 819], \"label\": \"Text\"}, {\"bbox_2d\": [137, 833, 402, 848], \"label\": \"Text\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [226, 165, 808, 257], \"label\": \"Text\"}, {\"bbox_2d\": [226, 280, 808, 372], \"label\": \"Text\"}, {\"bbox_2d\": [226, 394, 808, 462], \"label\": \"Text\"}, {\"bbox_2d\": [226, 484, 808, 551], \"label\": \"Text\"}, {\"bbox_2d\": [226, 573, 808, 611], \"label\": \"Text\"}, {\"bbox_2d\": [226, 634, 808, 673], \"label\": \"Text\"}, {\"bbox_2d\": [226, 696, 808, 738], \"label\": \"Text\"}, {\"bbox_2d\": [226, 761, 808, 800], \"label\": \"Text\"}, {\"bbox_2d\": [226, 823, 808, 862], \"label\": \"Text\"}, {\"bbox_2d\": [509, 908, 526, 922], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [226, 167, 808, 257], \"label\": \"List-item\"}, {\"bbox_2d\": [226, 281, 808, 371], \"label\": \"List-item\"}, {\"bbox_2d\": [226, 396, 808, 460], \"label\": \"List-item\"}, {\"bbox_2d\": [226, 485, 808, 548], \"label\": \"List-item\"}, {\"bbox_2d\": [226, 573, 808, 611], \"label\": \"List-item\"}, {\"bbox_2d\": [226, 636, 808, 674], \"label\": \"List-item\"}, {\"bbox_2d\": [226, 699, 808, 736], \"label\": \"List-item\"}, {\"bbox_2d\": [226, 761, 808, 799], \"label\": \"List-item\"}, {\"bbox_2d\": [226, 824, 808, 861], \"label\": \"List-item\"}, {\"bbox_2d\": [508, 910, 525, 921], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 152, 112], \"label\": \"Text\"}, {\"bbox_2d\": [121, 169, 778, 184], \"label\": \"Text\"}, {\"bbox_2d\": [207, 202, 911, 330], \"label\": \"Formula\"}, {\"bbox_2d\": [358, 347, 911, 401], \"label\": \"Formula\"}, {\"bbox_2d\": [209, 407, 911, 507], \"label\": \"Formula\"}, {\"bbox_2d\": [207, 527, 911, 608], \"label\": \"Formula\"}, {\"bbox_2d\": [207, 614, 911, 715], \"label\": \"Formula\"}, {\"bbox_2d\": [207, 720, 911, 819], \"label\": \"Formula\"}, {\"bbox_2d\": [207, 824, 911, 941], \"label\": \"Formula\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [207, 834, 911, 950], \"label\": \"Formula\"}, {\"bbox_2d\": [207, 630, 464, 712], \"label\": \"Formula\"}, {\"bbox_2d\": [207, 407, 279, 528], \"label\": \"Formula\"}, {\"bbox_2d\": [207, 202, 279, 322], \"label\": \"Formula\"}, {\"bbox_2d\": [239, 123, 911, 160], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 178, 110], \"label\": \"Text\"}, {\"bbox_2d\": [121, 168, 689, 187], \"label\": \"Text\"}, {\"bbox_2d\": [498, 989, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [239, 121, 910, 157], \"label\": \"Formula\"}, {\"bbox_2d\": [358, 261, 910, 400], \"label\": \"Formula\"}, {\"bbox_2d\": [358, 469, 910, 608], \"label\": \"Formula\"}, {\"bbox_2d\": [358, 678, 910, 817], \"label\": \"Formula\"}, {\"bbox_2d\": [358, 887, 910, 906], \"label\": \"Formula\"}, {\"bbox_2d\": [212, 200, 349, 324], \"label\": \"Picture\"}, {\"bbox_2d\": [213, 409, 349, 533], \"label\": \"Picture\"}, {\"bbox_2d\": [207, 621, 350, 741], \"label\": \"Picture\"}, {\"bbox_2d\": [207, 830, 349, 950], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [217, 88, 800, 464], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 501, 911, 545], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 574, 911, 645], \"label\": \"Text\"}, {\"bbox_2d\": [121, 688, 482, 704], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 734, 911, 940], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [215, 90, 803, 458], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 501, 911, 545], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 574, 911, 645], \"label\": \"Text\"}, {\"bbox_2d\": [121, 733, 911, 940], \"label\": \"Text\"}, {\"bbox_2d\": [121, 689, 482, 702], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 193], \"label\": \"Text\"}, {\"bbox_2d\": [121, 587, 911, 744], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 775, 911, 900], \"label\": \"Text\"}, {\"bbox_2d\": [121, 912, 911, 955], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [241, 209, 789, 553], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 911, 192], \"label\": \"Text\"}, {\"bbox_2d\": [121, 587, 911, 742], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 776, 911, 900], \"label\": \"Text\"}, {\"bbox_2d\": [121, 913, 911, 954], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [223, 206, 794, 562], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [34, 276, 59, 744], \"label\": \"Page-header\"}, {\"bbox_2d\": [248, 113, 739, 126], \"label\": \"Title\"}, {\"bbox_2d\": [309, 146, 677, 155], \"label\": \"Text\"}, {\"bbox_2d\": [216, 168, 772, 195], \"label\": \"Text\"}, {\"bbox_2d\": [392, 203, 596, 215], \"label\": \"Text\"}, {\"bbox_2d\": [156, 228, 831, 291], \"label\": \"Text\"}, {\"bbox_2d\": [409, 308, 578, 317], \"label\": \"Text\"}, {\"bbox_2d\": [121, 384, 478, 677], \"label\": \"Text\"}, {\"bbox_2d\": [121, 680, 478, 913], \"label\": \"Text\"}, {\"bbox_2d\": [509, 711, 867, 913], \"label\": \"Text\"}, {\"bbox_2d\": [539, 674, 834, 685], \"label\": \"Section-header\"}, {\"bbox_2d\": [216, 345, 383, 356], \"label\": \"Section-header\"}, {\"bbox_2d\": [509, 346, 867, 436], \"label\": \"Text\"}, {\"bbox_2d\": [509, 439, 867, 607], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [248, 112, 738, 126], \"label\": \"Title\"}, {\"bbox_2d\": [309, 144, 676, 158], \"label\": \"Text\"}, {\"bbox_2d\": [294, 169, 693, 179], \"label\": \"Text\"}, {\"bbox_2d\": [214, 186, 772, 197], \"label\": \"Text\"}, {\"bbox_2d\": [391, 204, 596, 214], \"label\": \"Text\"}, {\"bbox_2d\": [156, 226, 831, 292], \"label\": \"Text\"}, {\"bbox_2d\": [408, 308, 578, 317], \"label\": \"Text\"}, {\"bbox_2d\": [121, 383, 478, 676], \"label\": \"Text\"}, {\"bbox_2d\": [121, 680, 478, 911], \"label\": \"Text\"}, {\"bbox_2d\": [508, 712, 865, 911], \"label\": \"Text\"}, {\"bbox_2d\": [508, 439, 865, 607], \"label\": \"Text\"}, {\"bbox_2d\": [508, 345, 865, 435], \"label\": \"Text\"}, {\"bbox_2d\": [217, 346, 382, 357], \"label\": \"Section-header\"}, {\"bbox_2d\": [540, 675, 834, 685], \"label\": \"Section-header\"}, {\"bbox_2d\": [31, 276, 61, 745], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [457, 53, 601, 71], \"label\": \"Page-header\"}, {\"bbox_2d\": [717, 72, 741, 83], \"label\": \"Page-header\"}, {\"bbox_2d\": [321, 74, 344, 86], \"label\": \"Page-header\"}, {\"bbox_2d\": [156, 92, 516, 136], \"label\": \"Text\"}, {\"bbox_2d\": [156, 132, 516, 175], \"label\": \"Text\"}, {\"bbox_2d\": [156, 171, 516, 964], \"label\": \"Text\"}, {\"bbox_2d\": [542, 93, 932, 965], \"label\": \"Text\"}, {\"bbox_2d\": [542, 969, 932, 986], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [320, 76, 345, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [456, 56, 605, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [715, 76, 741, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [541, 94, 913, 900], \"label\": \"Text\"}, {\"bbox_2d\": [543, 903, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [148, 94, 517, 130], \"label\": \"Text\"}, {\"bbox_2d\": [148, 133, 517, 169], \"label\": \"Text\"}, {\"bbox_2d\": [147, 172, 519, 965], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [129, 18, 335, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [71, 68, 133, 79], \"label\": \"Text\"}, {\"bbox_2d\": [71, 704, 391, 714], \"label\": \"Text\"}, {\"bbox_2d\": [60, 997, 170, 1017], \"label\": \"Page-footer\"}, {\"bbox_2d\": [71, 100, 917, 701], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 335, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 71, 133, 79], \"label\": \"Text\"}, {\"bbox_2d\": [69, 102, 916, 699], \"label\": \"Table\"}, {\"bbox_2d\": [72, 707, 392, 716], \"label\": \"Text\"}, {\"bbox_2d\": [58, 1001, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [49, 51, 647, 77], \"label\": \"Page-header\"}, {\"bbox_2d\": [50, 83, 285, 94], \"label\": \"Text\"}, {\"bbox_2d\": [50, 740, 956, 775], \"label\": \"Text\"}, {\"bbox_2d\": [50, 1001, 220, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [50, 157, 977, 731], \"label\": \"Table\"}, {\"bbox_2d\": [50, 138, 251, 149], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [51, 139, 252, 150], \"label\": \"Section-header\"}, {\"bbox_2d\": [50, 152, 979, 734], \"label\": \"Table\"}, {\"bbox_2d\": [51, 740, 955, 777], \"label\": \"Text\"}, {\"bbox_2d\": [51, 56, 646, 78], \"label\": \"Section-header\"}, {\"bbox_2d\": [51, 84, 285, 94], \"label\": \"Text\"}, {\"bbox_2d\": [67, 1001, 113, 1014], \"label\": \"Picture\"}, {\"bbox_2d\": [33, 1003, 46, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [117, 1004, 220, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [98, 77, 937, 129], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 149, 937, 183], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 203, 937, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 293, 937, 327], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 347, 937, 381], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 401, 937, 453], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 473, 937, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 528, 918, 545], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 565, 547, 580], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 599, 355, 616], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 636, 937, 689], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 709, 937, 743], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 763, 937, 797], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 817, 937, 851], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 871, 937, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [84, 977, 181, 992], \"label\": \"Page-footer\"}, {\"bbox_2d\": [921, 957, 953, 972], \"label\": \"Page-footer\"}, {\"bbox_2d\": [915, 977, 953, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [98, 74, 944, 129], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 146, 943, 183], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 200, 948, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 290, 943, 328], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 344, 947, 382], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 398, 944, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 470, 937, 508], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 524, 922, 544], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 632, 942, 688], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 704, 945, 742], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 758, 945, 796], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 812, 942, 850], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 867, 945, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [84, 974, 187, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [913, 955, 959, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 560, 552, 580], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 596, 360, 616], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [299, 93, 333, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [693, 93, 727, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [126, 111, 500, 187], \"label\": \"Text\"}, {\"bbox_2d\": [126, 189, 500, 290], \"label\": \"Text\"}, {\"bbox_2d\": [126, 292, 500, 345], \"label\": \"Text\"}, {\"bbox_2d\": [126, 347, 500, 378], \"label\": \"Text\"}, {\"bbox_2d\": [126, 380, 500, 411], \"label\": \"Text\"}, {\"bbox_2d\": [126, 413, 500, 561], \"label\": \"Text\"}, {\"bbox_2d\": [126, 563, 500, 605], \"label\": \"Text\"}, {\"bbox_2d\": [126, 607, 500, 641], \"label\": \"Text\"}, {\"bbox_2d\": [126, 643, 500, 672], \"label\": \"Text\"}, {\"bbox_2d\": [126, 674, 500, 731], \"label\": \"Text\"}, {\"bbox_2d\": [126, 733, 500, 852], \"label\": \"Text\"}, {\"bbox_2d\": [126, 854, 500, 908], \"label\": \"Text\"}, {\"bbox_2d\": [126, 919, 500, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 113, 895, 166], \"label\": \"Text\"}, {\"bbox_2d\": [526, 169, 895, 275], \"label\": \"Text\"}, {\"bbox_2d\": [526, 278, 895, 561], \"label\": \"Text\"}, {\"bbox_2d\": [526, 563, 895, 605], \"label\": \"Text\"}, {\"bbox_2d\": [526, 607, 895, 650], \"label\": \"Text\"}, {\"bbox_2d\": [526, 652, 895, 694], \"label\": \"Text\"}, {\"bbox_2d\": [526, 696, 895, 718], \"label\": \"Text\"}, {\"bbox_2d\": [526, 720, 895, 740], \"label\": \"Text\"}, {\"bbox_2d\": [526, 742, 895, 772], \"label\": \"Text\"}, {\"bbox_2d\": [526, 774, 895, 856], \"label\": \"Text\"}, {\"bbox_2d\": [526, 858, 895, 888], \"label\": \"Text\"}, {\"bbox_2d\": [526, 890, 895, 944], \"label\": \"Text\"}, {\"bbox_2d\": [526, 946, 895, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 903, 726, 914], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [130, 737, 520, 851], \"label\": \"Text\"}, {\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [299, 93, 332, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [694, 93, 727, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 110, 503, 186], \"label\": \"Text\"}, {\"bbox_2d\": [130, 189, 500, 290], \"label\": \"Text\"}, {\"bbox_2d\": [131, 293, 500, 369], \"label\": \"Text\"}, {\"bbox_2d\": [130, 371, 503, 408], \"label\": \"Text\"}, {\"bbox_2d\": [130, 411, 500, 564], \"label\": \"Text\"}, {\"bbox_2d\": [130, 567, 499, 630], \"label\": \"Text\"}, {\"bbox_2d\": [130, 633, 500, 669], \"label\": \"Text\"}, {\"bbox_2d\": [130, 672, 499, 734], \"label\": \"Text\"}, {\"bbox_2d\": [130, 855, 499, 904], \"label\": \"Text\"}, {\"bbox_2d\": [131, 907, 251, 917], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 920, 500, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 946, 895, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 933, 643, 943], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 881, 896, 930], \"label\": \"Text\"}, {\"bbox_2d\": [526, 776, 896, 878], \"label\": \"Text\"}, {\"bbox_2d\": [526, 750, 891, 773], \"label\": \"Text\"}, {\"bbox_2d\": [526, 711, 896, 747], \"label\": \"Text\"}, {\"bbox_2d\": [526, 672, 894, 708], \"label\": \"Text\"}, {\"bbox_2d\": [526, 617, 895, 669], \"label\": \"Text\"}, {\"bbox_2d\": [526, 563, 895, 614], \"label\": \"Text\"}, {\"bbox_2d\": [524, 275, 898, 559], \"label\": \"Text\"}, {\"bbox_2d\": [526, 165, 895, 271], \"label\": \"Text\"}, {\"bbox_2d\": [526, 110, 898, 162], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [157, 201, 381, 210], \"label\": \"Text\"}, {\"bbox_2d\": [157, 287, 880, 313], \"label\": \"Text\"}, {\"bbox_2d\": [157, 341, 880, 382], \"label\": \"Text\"}, {\"bbox_2d\": [157, 410, 875, 422], \"label\": \"Text\"}, {\"bbox_2d\": [157, 562, 880, 607], \"label\": \"Text\"}, {\"bbox_2d\": [157, 500, 442, 511], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 524, 441, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 681, 365, 693], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 731, 880, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 833, 326, 845], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 858, 441, 870], \"label\": \"List-item\"}, {\"bbox_2d\": [157, 642, 326, 654], \"label\": \"Section-header\"}, {\"bbox_2d\": [157, 796, 320, 808], \"label\": \"Section-header\"}, {\"bbox_2d\": [157, 899, 880, 926], \"label\": \"Text\"}, {\"bbox_2d\": [508, 979, 527, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [386, 92, 880, 136], \"label\": \"Section-header\"}, {\"bbox_2d\": [477, 168, 556, 184], \"label\": \"Section-header\"}, {\"bbox_2d\": [157, 248, 756, 259], \"label\": \"Section-header\"}, {\"bbox_2d\": [157, 461, 303, 473], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [779, 96, 886, 108], \"label\": \"Text\"}, {\"bbox_2d\": [156, 198, 386, 210], \"label\": \"Text\"}, {\"bbox_2d\": [156, 287, 884, 314], \"label\": \"Text\"}, {\"bbox_2d\": [156, 342, 884, 383], \"label\": \"Text\"}, {\"bbox_2d\": [156, 411, 880, 423], \"label\": \"Text\"}, {\"bbox_2d\": [156, 564, 884, 605], \"label\": \"Text\"}, {\"bbox_2d\": [156, 500, 446, 511], \"label\": \"Text\"}, {\"bbox_2d\": [156, 524, 445, 536], \"label\": \"Text\"}, {\"bbox_2d\": [156, 682, 370, 694], \"label\": \"Text\"}, {\"bbox_2d\": [156, 707, 445, 719], \"label\": \"Text\"}, {\"bbox_2d\": [156, 732, 884, 758], \"label\": \"Text\"}, {\"bbox_2d\": [156, 835, 331, 847], \"label\": \"Text\"}, {\"bbox_2d\": [156, 860, 445, 872], \"label\": \"Text\"}, {\"bbox_2d\": [156, 899, 884, 926], \"label\": \"Text\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [156, 796, 324, 808], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 643, 330, 655], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 460, 307, 472], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 248, 761, 260], \"label\": \"Section-header\"}, {\"bbox_2d\": [478, 170, 564, 185], \"label\": \"Section-header\"}, {\"bbox_2d\": [386, 120, 655, 136], \"label\": \"Section-header\"}, {\"bbox_2d\": [434, 92, 610, 109], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [103, 885, 711, 904], \"label\": \"Text\"}, {\"bbox_2d\": [103, 939, 234, 972], \"label\": \"Page-footer\"}, {\"bbox_2d\": [866, 939, 919, 954], \"label\": \"Page-footer\"}, {\"bbox_2d\": [103, 302, 607, 320], \"label\": \"Section-header\"}, {\"bbox_2d\": [150, 334, 920, 749], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [104, 296, 611, 323], \"label\": \"Section-header\"}, {\"bbox_2d\": [150, 330, 920, 751], \"label\": \"Table\"}, {\"bbox_2d\": [104, 885, 714, 904], \"label\": \"Footnote\"}, {\"bbox_2d\": [104, 937, 232, 973], \"label\": \"Page-footer\"}, {\"bbox_2d\": [865, 937, 918, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [824, 43, 926, 55], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 45, 489, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [231, 92, 906, 119], \"label\": \"Text\"}, {\"bbox_2d\": [231, 158, 926, 219], \"label\": \"Text\"}, {\"bbox_2d\": [231, 235, 910, 278], \"label\": \"Text\"}, {\"bbox_2d\": [231, 294, 677, 307], \"label\": \"Text\"}, {\"bbox_2d\": [231, 317, 923, 344], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 355, 923, 398], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 409, 923, 436], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 447, 906, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 517, 926, 590], \"label\": \"List-item\"}, {\"bbox_2d\": [241, 616, 736, 628], \"label\": \"Text\"}, {\"bbox_2d\": [409, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 139, 514, 152], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [109, 46, 489, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [824, 45, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [231, 93, 910, 121], \"label\": \"Text\"}, {\"bbox_2d\": [231, 139, 512, 152], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 158, 942, 218], \"label\": \"Text\"}, {\"bbox_2d\": [231, 234, 916, 278], \"label\": \"Text\"}, {\"bbox_2d\": [231, 295, 677, 307], \"label\": \"Text\"}, {\"bbox_2d\": [231, 317, 926, 344], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 355, 926, 398], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 408, 926, 436], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 446, 909, 505], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 516, 926, 590], \"label\": \"List-item\"}, {\"bbox_2d\": [409, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [241, 615, 737, 627], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [134, 93, 910, 126], \"label\": \"Title\"}, {\"bbox_2d\": [264, 144, 780, 156], \"label\": \"Text\"}, {\"bbox_2d\": [147, 163, 898, 190], \"label\": \"Text\"}, {\"bbox_2d\": [428, 192, 613, 204], \"label\": \"Text\"}, {\"bbox_2d\": [182, 214, 840, 363], \"label\": \"Text\"}, {\"bbox_2d\": [536, 391, 952, 447], \"label\": \"Text\"}, {\"bbox_2d\": [91, 390, 506, 642], \"label\": \"Text\"}, {\"bbox_2d\": [536, 450, 952, 825], \"label\": \"Text\"}, {\"bbox_2d\": [91, 651, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 832, 952, 947], \"label\": \"Text\"}, {\"bbox_2d\": [32, 288, 60, 738], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [108, 72, 112, 83], \"label\": \"Page-header\"}, {\"bbox_2d\": [133, 93, 909, 125], \"label\": \"Title\"}, {\"bbox_2d\": [263, 143, 778, 156], \"label\": \"Text\"}, {\"bbox_2d\": [145, 164, 897, 174], \"label\": \"Text\"}, {\"bbox_2d\": [128, 177, 914, 188], \"label\": \"Text\"}, {\"bbox_2d\": [430, 191, 612, 202], \"label\": \"Text\"}, {\"bbox_2d\": [184, 213, 858, 361], \"label\": \"Text\"}, {\"bbox_2d\": [91, 390, 506, 641], \"label\": \"Text\"}, {\"bbox_2d\": [91, 652, 506, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 831, 951, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 453, 951, 825], \"label\": \"Text\"}, {\"bbox_2d\": [536, 390, 951, 447], \"label\": \"Text\"}, {\"bbox_2d\": [31, 285, 61, 739], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 12, 914, 609], \"label\": \"Table\"}, {\"bbox_2d\": [509, 596, 525, 607], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [214, 59, 840, 82], \"label\": \"Text\"}, {\"bbox_2d\": [214, 97, 801, 119], \"label\": \"Text\"}, {\"bbox_2d\": [214, 134, 832, 157], \"label\": \"Text\"}, {\"bbox_2d\": [214, 172, 840, 194], \"label\": \"Text\"}, {\"bbox_2d\": [214, 209, 840, 232], \"label\": \"Text\"}, {\"bbox_2d\": [214, 247, 844, 282], \"label\": \"Text\"}, {\"bbox_2d\": [214, 296, 818, 331], \"label\": \"Text\"}, {\"bbox_2d\": [214, 350, 832, 372], \"label\": \"Text\"}, {\"bbox_2d\": [214, 387, 844, 422], \"label\": \"Text\"}, {\"bbox_2d\": [214, 437, 837, 472], \"label\": \"Text\"}, {\"bbox_2d\": [214, 487, 846, 521], \"label\": \"Text\"}, {\"bbox_2d\": [214, 536, 826, 583], \"label\": \"Text\"}, {\"bbox_2d\": [508, 596, 525, 606], \"label\": \"Page-footer\"}, {\"bbox_2d\": [859, 12, 910, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [466, 49, 595, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [124, 12, 201, 57], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [197, 80, 613, 104], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 113, 637, 134], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 331, 695, 352], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 500, 616, 524], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 533, 637, 555], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 750, 695, 772], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 974, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 130, 832, 149], \"label\": \"Text\"}, {\"bbox_2d\": [302, 148, 592, 324], \"label\": \"Picture\"}, {\"bbox_2d\": [273, 345, 652, 365], \"label\": \"Text\"}, {\"bbox_2d\": [317, 365, 593, 487], \"label\": \"Picture\"}, {\"bbox_2d\": [273, 549, 828, 569], \"label\": \"Text\"}, {\"bbox_2d\": [302, 568, 593, 745], \"label\": \"Picture\"}, {\"bbox_2d\": [317, 784, 593, 907], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [197, 85, 612, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 117, 637, 132], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 335, 694, 350], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 505, 615, 522], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 538, 637, 552], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 756, 694, 770], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 771, 648, 785], \"label\": \"Text\"}, {\"bbox_2d\": [273, 553, 828, 567], \"label\": \"Text\"}, {\"bbox_2d\": [273, 351, 651, 364], \"label\": \"Text\"}, {\"bbox_2d\": [273, 133, 831, 146], \"label\": \"Text\"}, {\"bbox_2d\": [296, 151, 592, 308], \"label\": \"Text\"}, {\"bbox_2d\": [319, 369, 592, 486], \"label\": \"Text\"}, {\"bbox_2d\": [296, 571, 592, 728], \"label\": \"Text\"}, {\"bbox_2d\": [319, 789, 592, 907], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [508, 994, 532, 1010], \"label\": \"Page-footer\"}, {\"bbox_2d\": [28, 360, 38, 730], \"label\": \"Page-header\"}, {\"bbox_2d\": [60, 43, 278, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 96, 497, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [527, 96, 967, 978], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [508, 993, 612, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [23, 361, 41, 730], \"label\": \"Page-header\"}, {\"bbox_2d\": [60, 41, 276, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 66, 968, 978], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [170, 273, 473, 477], \"label\": \"Picture\"}, {\"bbox_2d\": [565, 294, 869, 477], \"label\": \"Picture\"}, {\"bbox_2d\": [568, 536, 867, 736], \"label\": \"Picture\"}, {\"bbox_2d\": [186, 532, 473, 737], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 485, 400, 497], \"label\": \"Caption\"}, {\"bbox_2d\": [627, 485, 801, 497], \"label\": \"Caption\"}, {\"bbox_2d\": [212, 739, 430, 752], \"label\": \"Caption\"}, {\"bbox_2d\": [298, 764, 344, 775], \"label\": \"Caption\"}, {\"bbox_2d\": [607, 740, 799, 765], \"label\": \"Caption\"}, {\"bbox_2d\": [691, 764, 737, 775], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [173, 277, 471, 478], \"label\": \"Picture\"}, {\"bbox_2d\": [563, 292, 864, 478], \"label\": \"Picture\"}, {\"bbox_2d\": [173, 537, 470, 730], \"label\": \"Picture\"}, {\"bbox_2d\": [565, 535, 863, 730], \"label\": \"Picture\"}, {\"bbox_2d\": [241, 485, 401, 496], \"label\": \"Caption\"}, {\"bbox_2d\": [625, 485, 801, 496], \"label\": \"Caption\"}, {\"bbox_2d\": [627, 739, 804, 775], \"label\": \"Caption\"}, {\"bbox_2d\": [213, 739, 429, 775], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [818, 35, 987, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 123, 511, 207], \"label\": \"Text\"}, {\"bbox_2d\": [109, 232, 510, 315], \"label\": \"Text\"}, {\"bbox_2d\": [125, 398, 496, 496], \"label\": \"Text\"}, {\"bbox_2d\": [125, 513, 496, 630], \"label\": \"Text\"}, {\"bbox_2d\": [125, 647, 496, 902], \"label\": \"Text\"}, {\"bbox_2d\": [125, 920, 496, 966], \"label\": \"Text\"}, {\"bbox_2d\": [542, 906, 943, 966], \"label\": \"Text\"}, {\"bbox_2d\": [542, 486, 943, 569], \"label\": \"Text\"}, {\"bbox_2d\": [542, 340, 943, 461], \"label\": \"Text\"}, {\"bbox_2d\": [542, 195, 943, 315], \"label\": \"Text\"}, {\"bbox_2d\": [542, 123, 943, 191], \"label\": \"Text\"}, {\"bbox_2d\": [125, 360, 446, 376], \"label\": \"Section-header\"}, {\"bbox_2d\": [595, 594, 1035, 885], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [818, 35, 988, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 121, 512, 207], \"label\": \"Text\"}, {\"bbox_2d\": [109, 229, 512, 315], \"label\": \"Text\"}, {\"bbox_2d\": [541, 121, 944, 189], \"label\": \"Text\"}, {\"bbox_2d\": [541, 193, 944, 315], \"label\": \"Text\"}, {\"bbox_2d\": [541, 337, 944, 460], \"label\": \"Text\"}, {\"bbox_2d\": [541, 482, 944, 568], \"label\": \"Text\"}, {\"bbox_2d\": [125, 396, 501, 494], \"label\": \"Text\"}, {\"bbox_2d\": [125, 646, 468, 901], \"label\": \"Text\"}, {\"bbox_2d\": [125, 512, 500, 576], \"label\": \"Text\"}, {\"bbox_2d\": [125, 582, 468, 628], \"label\": \"Text\"}, {\"bbox_2d\": [125, 919, 501, 965], \"label\": \"Text\"}, {\"bbox_2d\": [125, 357, 447, 373], \"label\": \"Section-header\"}, {\"bbox_2d\": [482, 592, 1035, 887], \"label\": \"Picture\"}, {\"bbox_2d\": [541, 902, 948, 967], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 33, 411, 88], \"label\": \"Picture\"}, {\"bbox_2d\": [125, 138, 912, 166], \"label\": \"Text\"}, {\"bbox_2d\": [125, 187, 911, 246], \"label\": \"Text\"}, {\"bbox_2d\": [125, 313, 912, 401], \"label\": \"Text\"}, {\"bbox_2d\": [172, 267, 911, 294], \"label\": \"Section-header\"}, {\"bbox_2d\": [534, 970, 551, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [123, 136, 918, 166], \"label\": \"Text\"}, {\"bbox_2d\": [123, 185, 918, 245], \"label\": \"Text\"}, {\"bbox_2d\": [123, 311, 918, 401], \"label\": \"Text\"}, {\"bbox_2d\": [170, 264, 918, 294], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [123, 34, 408, 86], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [589, 130, 907, 157], \"label\": \"Page-header\"}, {\"bbox_2d\": [1018, 141, 1026, 152], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 189, 906, 961], \"label\": \"Table\"}, {\"bbox_2d\": [88, 971, 356, 980], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [589, 126, 907, 158], \"label\": \"Page-header\"}, {\"bbox_2d\": [1016, 137, 1028, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [85, 188, 905, 961], \"label\": \"Table\"}, {\"bbox_2d\": [87, 971, 356, 979], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 141, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 64, 92, 75], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 78, 1009, 149], \"label\": \"Text\"}, {\"bbox_2d\": [25, 155, 1009, 207], \"label\": \"Text\"}, {\"bbox_2d\": [25, 217, 1009, 365], \"label\": \"Text\"}, {\"bbox_2d\": [25, 370, 1009, 443], \"label\": \"Text\"}, {\"bbox_2d\": [58, 455, 666, 466], \"label\": \"Text\"}, {\"bbox_2d\": [509, 665, 526, 676], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 475, 1011, 605], \"label\": \"Table\"}, {\"bbox_2d\": [55, 608, 409, 618], \"label\": \"Text\"}, {\"bbox_2d\": [55, 624, 971, 634], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 140, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 64, 93, 75], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 665, 525, 675], \"label\": \"Page-footer\"}, {\"bbox_2d\": [55, 624, 971, 635], \"label\": \"Text\"}, {\"bbox_2d\": [55, 607, 410, 618], \"label\": \"Text\"}, {\"bbox_2d\": [58, 455, 667, 466], \"label\": \"Text\"}, {\"bbox_2d\": [25, 370, 1009, 449], \"label\": \"Text\"}, {\"bbox_2d\": [25, 217, 1009, 364], \"label\": \"Text\"}, {\"bbox_2d\": [25, 155, 1009, 211], \"label\": \"Text\"}, {\"bbox_2d\": [25, 81, 1009, 148], \"label\": \"Text\"}, {\"bbox_2d\": [22, 474, 1010, 591], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 97, 914, 136], \"label\": \"Text\"}, {\"bbox_2d\": [121, 161, 758, 172], \"label\": \"Text\"}, {\"bbox_2d\": [121, 197, 914, 236], \"label\": \"Text\"}, {\"bbox_2d\": [121, 260, 914, 299], \"label\": \"Text\"}, {\"bbox_2d\": [121, 323, 914, 361], \"label\": \"Text\"}, {\"bbox_2d\": [121, 385, 558, 397], \"label\": \"Text\"}, {\"bbox_2d\": [121, 422, 914, 487], \"label\": \"Text\"}, {\"bbox_2d\": [121, 511, 914, 549], \"label\": \"Text\"}, {\"bbox_2d\": [121, 573, 914, 611], \"label\": \"Text\"}, {\"bbox_2d\": [121, 635, 914, 673], \"label\": \"Text\"}, {\"bbox_2d\": [121, 697, 914, 736], \"label\": \"Text\"}, {\"bbox_2d\": [121, 760, 914, 799], \"label\": \"Text\"}, {\"bbox_2d\": [121, 823, 725, 835], \"label\": \"Text\"}, {\"bbox_2d\": [121, 859, 914, 871], \"label\": \"Text\"}, {\"bbox_2d\": [121, 896, 914, 934], \"label\": \"Text\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 98, 914, 135], \"label\": \"Text\"}, {\"bbox_2d\": [121, 160, 759, 172], \"label\": \"Text\"}, {\"bbox_2d\": [121, 197, 914, 234], \"label\": \"Text\"}, {\"bbox_2d\": [121, 259, 914, 297], \"label\": \"Text\"}, {\"bbox_2d\": [121, 322, 914, 359], \"label\": \"Text\"}, {\"bbox_2d\": [121, 384, 559, 396], \"label\": \"Text\"}, {\"bbox_2d\": [121, 421, 914, 485], \"label\": \"Text\"}, {\"bbox_2d\": [121, 510, 914, 547], \"label\": \"Text\"}, {\"bbox_2d\": [121, 572, 914, 610], \"label\": \"Text\"}, {\"bbox_2d\": [121, 635, 914, 672], \"label\": \"Text\"}, {\"bbox_2d\": [121, 697, 914, 735], \"label\": \"Text\"}, {\"bbox_2d\": [121, 760, 914, 797], \"label\": \"Text\"}, {\"bbox_2d\": [121, 822, 725, 834], \"label\": \"Text\"}, {\"bbox_2d\": [121, 859, 914, 870], \"label\": \"Text\"}, {\"bbox_2d\": [121, 895, 914, 933], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [780, 73, 893, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [526, 143, 895, 204], \"label\": \"Text\"}, {\"bbox_2d\": [526, 212, 895, 390], \"label\": \"Text\"}, {\"bbox_2d\": [130, 328, 503, 560], \"label\": \"Text\"}, {\"bbox_2d\": [130, 566, 503, 656], \"label\": \"Text\"}, {\"bbox_2d\": [130, 661, 503, 915], \"label\": \"Text\"}, {\"bbox_2d\": [130, 920, 503, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 920, 895, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 879, 895, 913], \"label\": \"Text\"}, {\"bbox_2d\": [526, 644, 895, 758], \"label\": \"Text\"}, {\"bbox_2d\": [526, 521, 895, 636], \"label\": \"Text\"}, {\"bbox_2d\": [526, 398, 895, 513], \"label\": \"Text\"}, {\"bbox_2d\": [158, 143, 369, 301], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [779, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 92, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [526, 142, 895, 204], \"label\": \"Text\"}, {\"bbox_2d\": [526, 212, 895, 391], \"label\": \"Text\"}, {\"bbox_2d\": [526, 399, 895, 513], \"label\": \"Text\"}, {\"bbox_2d\": [130, 327, 500, 507], \"label\": \"Text\"}, {\"bbox_2d\": [130, 511, 503, 678], \"label\": \"Text\"}, {\"bbox_2d\": [130, 683, 500, 915], \"label\": \"Text\"}, {\"bbox_2d\": [130, 920, 500, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 920, 895, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 849, 895, 911], \"label\": \"Text\"}, {\"bbox_2d\": [526, 766, 895, 841], \"label\": \"Text\"}, {\"bbox_2d\": [526, 643, 898, 758], \"label\": \"Text\"}, {\"bbox_2d\": [526, 521, 895, 635], \"label\": \"Text\"}, {\"bbox_2d\": [156, 142, 371, 303], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [49, 36, 764, 880], \"label\": \"Picture\"}, {\"bbox_2d\": [46, 998, 170, 1015], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [53, 35, 1035, 983], \"label\": \"Picture\"}, {\"bbox_2d\": [43, 999, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 26, 352, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 94, 507, 136], \"label\": \"Text\"}, {\"bbox_2d\": [91, 176, 501, 186], \"label\": \"Text\"}, {\"bbox_2d\": [91, 192, 462, 202], \"label\": \"Text\"}, {\"bbox_2d\": [91, 208, 506, 228], \"label\": \"Text\"}, {\"bbox_2d\": [91, 254, 486, 279], \"label\": \"Text\"}, {\"bbox_2d\": [91, 289, 507, 314], \"label\": \"Text\"}, {\"bbox_2d\": [91, 344, 507, 414], \"label\": \"Text\"}, {\"bbox_2d\": [527, 274, 946, 359], \"label\": \"Text\"}, {\"bbox_2d\": [527, 366, 946, 391], \"label\": \"Text\"}, {\"bbox_2d\": [527, 393, 946, 420], \"label\": \"Text\"}, {\"bbox_2d\": [527, 422, 946, 447], \"label\": \"Text\"}, {\"bbox_2d\": [91, 445, 414, 455], \"label\": \"Text\"}, {\"bbox_2d\": [91, 461, 507, 485], \"label\": \"Text\"}, {\"bbox_2d\": [527, 93, 687, 104], \"label\": \"List-item\"}, {\"bbox_2d\": [527, 109, 946, 134], \"label\": \"List-item\"}, {\"bbox_2d\": [527, 145, 946, 170], \"label\": \"Text\"}, {\"bbox_2d\": [527, 176, 946, 199], \"label\": \"List-item\"}, {\"bbox_2d\": [527, 204, 827, 214], \"label\": \"List-item\"}, {\"bbox_2d\": [527, 219, 946, 243], \"label\": \"List-item\"}, {\"bbox_2d\": [527, 259, 747, 270], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 329, 255, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 240, 245, 250], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 317, 255, 328], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 429, 272, 440], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 504, 760, 525], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 504, 434, 525], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 536, 905, 605], \"label\": \"Picture\"}, {\"bbox_2d\": [106, 536, 461, 605], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 999, 444, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 351, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 508, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 240, 244, 250], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 330, 255, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 430, 271, 440], \"label\": \"Section-header\"}, {\"bbox_2d\": [89, 493, 509, 615], \"label\": \"Picture\"}, {\"bbox_2d\": [528, 494, 947, 619], \"label\": \"Picture\"}, {\"bbox_2d\": [528, 260, 747, 270], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 274, 944, 356], \"label\": \"Text\"}, {\"bbox_2d\": [528, 366, 944, 390], \"label\": \"Text\"}, {\"bbox_2d\": [91, 345, 507, 412], \"label\": \"Text\"}, {\"bbox_2d\": [91, 444, 415, 454], \"label\": \"Text\"}, {\"bbox_2d\": [91, 255, 507, 279], \"label\": \"Text\"}, {\"bbox_2d\": [91, 289, 507, 313], \"label\": \"Text\"}, {\"bbox_2d\": [91, 109, 507, 134], \"label\": \"Text\"}, {\"bbox_2d\": [528, 146, 944, 170], \"label\": \"Text\"}, {\"bbox_2d\": [91, 138, 275, 148], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 153, 411, 163], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 189, 462, 199], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 204, 507, 228], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 175, 501, 185], \"label\": \"Text\"}, {\"bbox_2d\": [528, 95, 687, 105], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 109, 944, 134], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 175, 944, 199], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 204, 828, 214], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 218, 944, 242], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 395, 944, 419], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 423, 944, 448], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 459, 507, 483], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 1000, 304, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [315, 1000, 444, 1008], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [532, 0, 951, 12], \"label\": \"Page-header\"}, {\"bbox_2d\": [77, 23, 994, 69], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 96, 505, 113], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 141, 666, 159], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 187, 994, 234], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 262, 994, 332], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 360, 990, 378], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 406, 764, 424], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 452, 994, 498], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 525, 994, 571], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 600, 994, 670], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 698, 811, 716], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 744, 994, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [41, 818, 994, 917], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 945, 680, 963], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 991, 994, 1009], \"label\": \"List-item\"}, {\"bbox_2d\": [930, 1021, 951, 1035], \"label\": \"Page-footer\"}, {\"bbox_2d\": [29, 1021, 57, 1035], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [0, 0, 510, 13], \"label\": \"Page-header\"}, {\"bbox_2d\": [530, 0, 1035, 13], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 23, 993, 68], \"label\": \"Text\"}, {\"bbox_2d\": [78, 96, 505, 113], \"label\": \"Text\"}, {\"bbox_2d\": [78, 142, 666, 159], \"label\": \"Text\"}, {\"bbox_2d\": [78, 188, 993, 232], \"label\": \"Text\"}, {\"bbox_2d\": [42, 261, 993, 332], \"label\": \"Text\"}, {\"bbox_2d\": [78, 361, 990, 378], \"label\": \"Text\"}, {\"bbox_2d\": [78, 406, 764, 424], \"label\": \"Text\"}, {\"bbox_2d\": [78, 452, 993, 497], \"label\": \"Text\"}, {\"bbox_2d\": [78, 525, 993, 570], \"label\": \"Text\"}, {\"bbox_2d\": [78, 598, 993, 670], \"label\": \"Text\"}, {\"bbox_2d\": [78, 698, 811, 716], \"label\": \"Text\"}, {\"bbox_2d\": [78, 744, 993, 789], \"label\": \"Text\"}, {\"bbox_2d\": [42, 817, 993, 916], \"label\": \"Text\"}, {\"bbox_2d\": [78, 944, 680, 962], \"label\": \"Text\"}, {\"bbox_2d\": [78, 990, 993, 1007], \"label\": \"Text\"}, {\"bbox_2d\": [0, 1022, 56, 1036], \"label\": \"Page-footer\"}, {\"bbox_2d\": [929, 1022, 1036, 1036], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 96, 348, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 309, 598, 323], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 142, 911, 265], \"label\": \"Text\"}, {\"bbox_2d\": [121, 355, 911, 478], \"label\": \"Text\"}, {\"bbox_2d\": [121, 539, 911, 747], \"label\": \"Text\"}, {\"bbox_2d\": [121, 759, 911, 910], \"label\": \"Text\"}, {\"bbox_2d\": [121, 922, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [382, 501, 911, 517], \"label\": \"Formula\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [147, 97, 348, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 309, 598, 322], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 141, 911, 266], \"label\": \"Text\"}, {\"bbox_2d\": [121, 354, 911, 479], \"label\": \"Text\"}, {\"bbox_2d\": [121, 539, 911, 746], \"label\": \"Text\"}, {\"bbox_2d\": [121, 758, 911, 910], \"label\": \"Text\"}, {\"bbox_2d\": [121, 922, 910, 965], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [383, 501, 911, 517], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 97, 594, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 132, 381, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 517, 474, 528], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 164, 914, 304], \"label\": \"Text\"}, {\"bbox_2d\": [154, 550, 611, 561], \"label\": \"Text\"}, {\"bbox_2d\": [121, 790, 628, 802], \"label\": \"Text\"}, {\"bbox_2d\": [152, 326, 910, 501], \"label\": \"Table\"}, {\"bbox_2d\": [153, 580, 910, 867], \"label\": \"Table\"}, {\"bbox_2d\": [498, 955, 536, 969], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 598, 113], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 129, 386, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 513, 474, 530], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 161, 921, 305], \"label\": \"Text\"}, {\"bbox_2d\": [155, 545, 611, 562], \"label\": \"Text\"}, {\"bbox_2d\": [143, 323, 917, 498], \"label\": \"Table\"}, {\"bbox_2d\": [143, 577, 913, 867], \"label\": \"Table\"}, {\"bbox_2d\": [498, 953, 541, 971], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 73, 285, 85], \"label\": \"Text\"}, {\"bbox_2d\": [91, 136, 506, 488], \"label\": \"Text\"}, {\"bbox_2d\": [91, 544, 506, 864], \"label\": \"Text\"}, {\"bbox_2d\": [91, 872, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 783, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 632, 951, 766], \"label\": \"Text\"}, {\"bbox_2d\": [536, 573, 951, 624], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 251, 951, 396], \"label\": \"Caption\"}, {\"bbox_2d\": [567, 70, 947, 211], \"label\": \"Picture\"}, {\"bbox_2d\": [567, 391, 923, 560], \"label\": \"Picture\"}, {\"bbox_2d\": [181, 496, 506, 542], \"label\": \"Formula\"}, {\"bbox_2d\": [181, 91, 506, 144], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 250, 951, 371], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 71, 287, 84], \"label\": \"Text\"}, {\"bbox_2d\": [91, 134, 506, 492], \"label\": \"Text\"}, {\"bbox_2d\": [244, 91, 506, 106], \"label\": \"Formula\"}, {\"bbox_2d\": [183, 110, 506, 134], \"label\": \"Formula\"}, {\"bbox_2d\": [240, 499, 506, 524], \"label\": \"Formula\"}, {\"bbox_2d\": [179, 518, 506, 543], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 572, 951, 606], \"label\": \"Caption\"}, {\"bbox_2d\": [540, 388, 946, 558], \"label\": \"Picture\"}, {\"bbox_2d\": [531, 62, 949, 236], \"label\": \"Picture\"}, {\"bbox_2d\": [536, 632, 951, 775], \"label\": \"Text\"}, {\"bbox_2d\": [536, 784, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [91, 874, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [91, 543, 506, 881], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [430, 28, 613, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [116, 97, 490, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 319, 284, 336], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 143, 537, 159], \"label\": \"Text\"}, {\"bbox_2d\": [116, 167, 937, 202], \"label\": \"Text\"}, {\"bbox_2d\": [116, 232, 625, 248], \"label\": \"Text\"}, {\"bbox_2d\": [116, 256, 648, 269], \"label\": \"Text\"}, {\"bbox_2d\": [116, 366, 471, 381], \"label\": \"Text\"}, {\"bbox_2d\": [116, 389, 624, 402], \"label\": \"Text\"}, {\"bbox_2d\": [116, 454, 537, 469], \"label\": \"Text\"}, {\"bbox_2d\": [116, 477, 588, 491], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [115, 92, 492, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 313, 284, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 138, 946, 204], \"label\": \"Text\"}, {\"bbox_2d\": [115, 226, 650, 270], \"label\": \"Text\"}, {\"bbox_2d\": [115, 359, 625, 403], \"label\": \"Text\"}, {\"bbox_2d\": [115, 448, 590, 492], \"label\": \"Text\"}, {\"bbox_2d\": [0, 518, 1034, 1034], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [773, 27, 952, 42], \"label\": \"Page-header\"}, {\"bbox_2d\": [58, 69, 975, 116], \"label\": \"Text\"}, {\"bbox_2d\": [58, 452, 968, 481], \"label\": \"Text\"}, {\"bbox_2d\": [58, 620, 955, 650], \"label\": \"Text\"}, {\"bbox_2d\": [485, 994, 529, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [136, 136, 576, 435], \"label\": \"Picture\"}, {\"bbox_2d\": [137, 502, 247, 591], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [774, 29, 952, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [58, 71, 976, 116], \"label\": \"Text\"}, {\"bbox_2d\": [58, 453, 968, 481], \"label\": \"Text\"}, {\"bbox_2d\": [58, 622, 956, 650], \"label\": \"Text\"}, {\"bbox_2d\": [135, 137, 573, 432], \"label\": \"Picture\"}, {\"bbox_2d\": [142, 505, 245, 589], \"label\": \"Picture\"}, {\"bbox_2d\": [485, 994, 529, 1006], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [86, 44, 962, 89], \"label\": \"Text\"}, {\"bbox_2d\": [86, 103, 939, 133], \"label\": \"Text\"}, {\"bbox_2d\": [86, 147, 967, 256], \"label\": \"Text\"}, {\"bbox_2d\": [86, 286, 960, 331], \"label\": \"Text\"}, {\"bbox_2d\": [86, 345, 942, 391], \"label\": \"Text\"}, {\"bbox_2d\": [86, 579, 963, 626], \"label\": \"Text\"}, {\"bbox_2d\": [86, 654, 925, 684], \"label\": \"Text\"}, {\"bbox_2d\": [86, 698, 928, 728], \"label\": \"Text\"}, {\"bbox_2d\": [86, 639, 193, 653], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 269, 248, 283], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 999, 99, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [86, 404, 944, 560], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [85, 42, 963, 89], \"label\": \"Text\"}, {\"bbox_2d\": [85, 101, 940, 132], \"label\": \"Text\"}, {\"bbox_2d\": [85, 145, 967, 255], \"label\": \"Text\"}, {\"bbox_2d\": [85, 284, 959, 331], \"label\": \"Text\"}, {\"bbox_2d\": [85, 343, 942, 390], \"label\": \"Text\"}, {\"bbox_2d\": [85, 578, 962, 625], \"label\": \"Text\"}, {\"bbox_2d\": [85, 653, 920, 684], \"label\": \"Text\"}, {\"bbox_2d\": [85, 697, 929, 728], \"label\": \"Text\"}, {\"bbox_2d\": [85, 635, 193, 654], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 266, 248, 284], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 997, 100, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 399, 949, 564], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [680, 25, 936, 43], \"label\": \"Page-header\"}, {\"bbox_2d\": [104, 67, 175, 103], \"label\": \"Picture\"}, {\"bbox_2d\": [104, 127, 542, 144], \"label\": \"Picture\"}, {\"bbox_2d\": [84, 161, 979, 954], \"label\": \"Picture\"}, {\"bbox_2d\": [84, 985, 183, 1002], \"label\": \"Section-header\"}, {\"bbox_2d\": [946, 968, 956, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [679, 22, 942, 43], \"label\": \"Page-header\"}, {\"bbox_2d\": [96, 68, 282, 107], \"label\": \"Picture\"}, {\"bbox_2d\": [913, 965, 959, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [84, 985, 187, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [83, 122, 980, 953], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [123, 48, 201, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 65, 294, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [184, 127, 917, 176], \"label\": \"Text\"}, {\"bbox_2d\": [146, 176, 481, 192], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 192, 551, 208], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 208, 895, 243], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 243, 880, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 326, 917, 357], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 357, 917, 406], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 406, 917, 441], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 441, 488, 457], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 457, 917, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 507, 917, 572], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 572, 917, 621], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 292, 667, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 308, 917, 324], \"label\": \"Text\"}, {\"bbox_2d\": [123, 621, 917, 670], \"label\": \"Text\"}, {\"bbox_2d\": [123, 670, 917, 738], \"label\": \"Text\"}, {\"bbox_2d\": [123, 755, 917, 804], \"label\": \"Text\"}, {\"bbox_2d\": [123, 798, 398, 819], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 972, 440, 988], \"label\": \"Page-footer\"}, {\"bbox_2d\": [123, 822, 917, 871], \"label\": \"Text\"}, {\"bbox_2d\": [184, 871, 889, 888], \"label\": \"Text\"}, {\"bbox_2d\": [146, 888, 912, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 920, 737, 936], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"Text\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"Text\"}, {\"bbox_2d\": [121, 126, 952, 176], \"label\": \"Text\"}, {\"bbox_2d\": [182, 308, 934, 325], \"label\": \"Text\"}, {\"bbox_2d\": [121, 623, 950, 672], \"label\": \"Text\"}, {\"bbox_2d\": [121, 672, 958, 738], \"label\": \"Text\"}, {\"bbox_2d\": [121, 755, 951, 804], \"label\": \"Text\"}, {\"bbox_2d\": [121, 821, 946, 871], \"label\": \"Text\"}, {\"bbox_2d\": [182, 871, 906, 887], \"label\": \"Text\"}, {\"bbox_2d\": [145, 176, 489, 192], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 193, 563, 209], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 209, 933, 242], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 242, 908, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 325, 950, 358], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 358, 946, 407], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 408, 947, 440], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 441, 488, 457], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 457, 963, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 507, 949, 573], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 573, 955, 622], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 888, 937, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 921, 750, 937], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 805, 398, 821], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 739, 308, 755], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 292, 674, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [547, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 45, 210, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [231, 94, 928, 284], \"label\": \"Table\"}, {\"bbox_2d\": [109, 309, 421, 325], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 343, 926, 449], \"label\": \"Text\"}, {\"bbox_2d\": [231, 467, 926, 556], \"label\": \"Text\"}, {\"bbox_2d\": [231, 573, 660, 584], \"label\": \"Caption\"}, {\"bbox_2d\": [229, 586, 929, 940], \"label\": \"Table\"}, {\"bbox_2d\": [109, 987, 682, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 45, 211, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [546, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [230, 92, 927, 284], \"label\": \"Table\"}, {\"bbox_2d\": [109, 309, 422, 324], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 343, 926, 449], \"label\": \"Text\"}, {\"bbox_2d\": [231, 466, 926, 556], \"label\": \"Text\"}, {\"bbox_2d\": [231, 573, 659, 584], \"label\": \"Caption\"}, {\"bbox_2d\": [230, 585, 932, 942], \"label\": \"Table\"}, {\"bbox_2d\": [109, 987, 684, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [58, 35, 158, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 206, 432, 311], \"label\": \"Section-header\"}, {\"bbox_2d\": [683, 449, 840, 500], \"label\": \"Section-header\"}, {\"bbox_2d\": [683, 559, 766, 575], \"label\": \"Section-header\"}, {\"bbox_2d\": [683, 512, 756, 526], \"label\": \"Text\"}, {\"bbox_2d\": [683, 585, 949, 678], \"label\": \"Text\"}, {\"bbox_2d\": [683, 688, 953, 771], \"label\": \"Text\"}, {\"bbox_2d\": [683, 783, 952, 826], \"label\": \"Text\"}, {\"bbox_2d\": [78, 983, 194, 996], \"label\": \"Page-footer\"}, {\"bbox_2d\": [683, 842, 957, 952], \"label\": \"Picture\"}, {\"bbox_2d\": [683, 110, 957, 401], \"label\": \"Picture\"}, {\"bbox_2d\": [50, 322, 574, 958], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [57, 34, 157, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 194, 445, 309], \"label\": \"Section-header\"}, {\"bbox_2d\": [684, 443, 838, 500], \"label\": \"Section-header\"}, {\"bbox_2d\": [684, 556, 765, 575], \"label\": \"Section-header\"}, {\"bbox_2d\": [684, 509, 756, 527], \"label\": \"Text\"}, {\"bbox_2d\": [684, 582, 948, 680], \"label\": \"Text\"}, {\"bbox_2d\": [684, 687, 958, 773], \"label\": \"Text\"}, {\"bbox_2d\": [684, 781, 952, 829], \"label\": \"Text\"}, {\"bbox_2d\": [683, 111, 956, 401], \"label\": \"Picture\"}, {\"bbox_2d\": [683, 843, 957, 952], \"label\": \"Picture\"}, {\"bbox_2d\": [96, 381, 573, 963], \"label\": \"Picture\"}, {\"bbox_2d\": [78, 984, 193, 995], \"label\": \"Page-footer\"}, {\"bbox_2d\": [18, 977, 64, 1008], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [204, 89, 883, 133], \"label\": \"Text\"}, {\"bbox_2d\": [156, 154, 883, 182], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 193, 883, 221], \"label\": \"Text\"}, {\"bbox_2d\": [171, 233, 691, 248], \"label\": \"Text\"}, {\"bbox_2d\": [235, 254, 883, 281], \"label\": \"Text\"}, {\"bbox_2d\": [171, 292, 508, 305], \"label\": \"Text\"}, {\"bbox_2d\": [204, 327, 650, 341], \"label\": \"Section-header\"}, {\"bbox_2d\": [204, 351, 883, 409], \"label\": \"Text\"}, {\"bbox_2d\": [204, 416, 883, 459], \"label\": \"Text\"}, {\"bbox_2d\": [222, 466, 862, 476], \"label\": \"Text\"}, {\"bbox_2d\": [235, 485, 789, 499], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 506, 701, 519], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 526, 748, 538], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 545, 640, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 565, 883, 591], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 598, 467, 611], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 618, 845, 631], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 638, 687, 651], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 658, 883, 685], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 707, 621, 719], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 702, 883, 730], \"label\": \"Text\"}, {\"bbox_2d\": [204, 737, 883, 781], \"label\": \"Text\"}, {\"bbox_2d\": [156, 791, 883, 820], \"label\": \"Text\"}, {\"bbox_2d\": [185, 827, 559, 840], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 847, 513, 859], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 887, 774, 900], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 919, 261, 930], \"label\": \"Footnote\"}, {\"bbox_2d\": [513, 977, 522, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [205, 91, 884, 132], \"label\": \"Text\"}, {\"bbox_2d\": [171, 234, 691, 246], \"label\": \"Text\"}, {\"bbox_2d\": [156, 195, 884, 221], \"label\": \"Text\"}, {\"bbox_2d\": [171, 293, 512, 305], \"label\": \"Text\"}, {\"bbox_2d\": [205, 353, 884, 409], \"label\": \"Text\"}, {\"bbox_2d\": [205, 417, 884, 459], \"label\": \"Text\"}, {\"bbox_2d\": [220, 467, 866, 478], \"label\": \"Text\"}, {\"bbox_2d\": [156, 704, 884, 731], \"label\": \"Text\"}, {\"bbox_2d\": [205, 739, 884, 780], \"label\": \"Text\"}, {\"bbox_2d\": [156, 793, 884, 820], \"label\": \"Text\"}, {\"bbox_2d\": [156, 921, 261, 930], \"label\": \"Footnote\"}, {\"bbox_2d\": [513, 979, 522, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [186, 828, 561, 839], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 848, 513, 859], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 887, 775, 899], \"label\": \"List-item\"}, {\"bbox_2d\": [269, 867, 307, 879], \"label\": \"Text\"}, {\"bbox_2d\": [235, 487, 791, 498], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 506, 703, 518], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 526, 751, 538], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 546, 644, 557], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 566, 884, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 600, 466, 612], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 620, 846, 632], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 640, 687, 652], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 660, 883, 671], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 679, 622, 691], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 155, 884, 182], \"label\": \"Section-header\"}, {\"bbox_2d\": [235, 254, 880, 281], \"label\": \"Text\"}, {\"bbox_2d\": [205, 328, 653, 340], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [737, 83, 938, 113], \"label\": \"Page-header\"}, {\"bbox_2d\": [476, 1014, 501, 1026], \"label\": \"Page-footer\"}, {\"bbox_2d\": [365, 894, 657, 907], \"label\": \"Caption\"}, {\"bbox_2d\": [173, 592, 721, 875], \"label\": \"Picture\"}, {\"bbox_2d\": [104, 157, 908, 542], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [736, 82, 958, 112], \"label\": \"Page-header\"}, {\"bbox_2d\": [503, 1006, 536, 1021], \"label\": \"Page-footer\"}, {\"bbox_2d\": [107, 158, 919, 877], \"label\": \"Picture\"}, {\"bbox_2d\": [365, 889, 658, 904], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [122, 205, 522, 270], \"label\": \"Text\"}, {\"bbox_2d\": [559, 208, 955, 257], \"label\": \"Text\"}, {\"bbox_2d\": [122, 326, 962, 363], \"label\": \"Text\"}, {\"bbox_2d\": [122, 310, 303, 322], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 405, 238, 417], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 423, 523, 434], \"label\": \"Text\"}, {\"bbox_2d\": [122, 551, 961, 573], \"label\": \"Text\"}, {\"bbox_2d\": [122, 635, 456, 646], \"label\": \"Text\"}, {\"bbox_2d\": [122, 616, 236, 628], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 776, 393, 788], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 794, 622, 804], \"label\": \"Text\"}, {\"bbox_2d\": [813, 1002, 985, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [1012, 794, 1023, 861], \"label\": \"Page-header\"}, {\"bbox_2d\": [122, 436, 967, 537], \"label\": \"Table\"}, {\"bbox_2d\": [122, 642, 963, 733], \"label\": \"Table\"}, {\"bbox_2d\": [122, 808, 963, 918], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [560, 209, 956, 258], \"label\": \"Text\"}, {\"bbox_2d\": [123, 209, 523, 270], \"label\": \"Text\"}, {\"bbox_2d\": [123, 330, 962, 366], \"label\": \"Text\"}, {\"bbox_2d\": [123, 408, 237, 418], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 312, 303, 322], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 778, 393, 788], \"label\": \"Section-header\"}, {\"bbox_2d\": [813, 1002, 985, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 806, 967, 920], \"label\": \"Table\"}, {\"bbox_2d\": [125, 795, 622, 807], \"label\": \"Text\"}, {\"bbox_2d\": [123, 647, 967, 735], \"label\": \"Table\"}, {\"bbox_2d\": [123, 637, 456, 648], \"label\": \"Text\"}, {\"bbox_2d\": [125, 619, 237, 629], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 551, 962, 575], \"label\": \"Text\"}, {\"bbox_2d\": [122, 436, 974, 536], \"label\": \"Table\"}, {\"bbox_2d\": [123, 426, 522, 437], \"label\": \"Text\"}, {\"bbox_2d\": [1014, 771, 1024, 861], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [7, 59, 28, 965], \"label\": \"Page-header\"}, {\"bbox_2d\": [494, 208, 570, 227], \"label\": \"Section-header\"}, {\"bbox_2d\": [494, 266, 591, 284], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 209, 182, 220], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 871, 390, 957], \"label\": \"Text\"}, {\"bbox_2d\": [540, 849, 836, 957], \"label\": \"Text\"}, {\"bbox_2d\": [540, 846, 611, 855], \"label\": \"Section-header\"}, {\"bbox_2d\": [494, 235, 716, 245], \"label\": \"Section-header\"}, {\"bbox_2d\": [494, 289, 880, 337], \"label\": \"Text\"}, {\"bbox_2d\": [494, 352, 864, 371], \"label\": \"Text\"}, {\"bbox_2d\": [494, 429, 888, 526], \"label\": \"Text\"}, {\"bbox_2d\": [123, 243, 387, 844], \"label\": \"Table\"}, {\"bbox_2d\": [537, 554, 803, 812], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [12, 59, 22, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [12, 140, 22, 235], \"label\": \"Page-header\"}, {\"bbox_2d\": [12, 264, 22, 320], \"label\": \"Page-header\"}, {\"bbox_2d\": [12, 362, 22, 431], \"label\": \"Page-header\"}, {\"bbox_2d\": [12, 470, 22, 532], \"label\": \"Page-header\"}, {\"bbox_2d\": [12, 580, 22, 631], \"label\": \"Page-header\"}, {\"bbox_2d\": [12, 682, 22, 739], \"label\": \"Page-header\"}, {\"bbox_2d\": [12, 769, 22, 861], \"label\": \"Page-header\"}, {\"bbox_2d\": [12, 878, 22, 962], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 209, 184, 219], \"label\": \"Section-header\"}, {\"bbox_2d\": [493, 210, 571, 227], \"label\": \"Section-header\"}, {\"bbox_2d\": [493, 265, 592, 282], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 872, 391, 959], \"label\": \"Text\"}, {\"bbox_2d\": [493, 234, 718, 244], \"label\": \"Text\"}, {\"bbox_2d\": [493, 289, 883, 336], \"label\": \"Text\"}, {\"bbox_2d\": [493, 352, 866, 370], \"label\": \"Text\"}, {\"bbox_2d\": [493, 429, 892, 525], \"label\": \"Text\"}, {\"bbox_2d\": [540, 859, 767, 872], \"label\": \"Text\"}, {\"bbox_2d\": [540, 876, 723, 889], \"label\": \"Text\"}, {\"bbox_2d\": [540, 893, 765, 906], \"label\": \"Text\"}, {\"bbox_2d\": [540, 910, 757, 923], \"label\": \"Text\"}, {\"bbox_2d\": [540, 928, 660, 941], \"label\": \"Text\"}, {\"bbox_2d\": [540, 945, 837, 958], \"label\": \"Text\"}, {\"bbox_2d\": [540, 846, 612, 854], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 240, 386, 843], \"label\": \"Table\"}, {\"bbox_2d\": [540, 555, 805, 809], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [325, 101, 707, 113], \"label\": \"Page-header\"}, {\"bbox_2d\": [442, 159, 595, 177], \"label\": \"Title\"}, {\"bbox_2d\": [275, 200, 760, 219], \"label\": \"Title\"}, {\"bbox_2d\": [175, 244, 861, 285], \"label\": \"Text\"}, {\"bbox_2d\": [289, 324, 748, 361], \"label\": \"Table\"}, {\"bbox_2d\": [156, 386, 880, 427], \"label\": \"Text\"}, {\"bbox_2d\": [156, 478, 878, 501], \"label\": \"Text\"}, {\"bbox_2d\": [171, 509, 586, 521], \"label\": \"Text\"}, {\"bbox_2d\": [171, 529, 556, 541], \"label\": \"Text\"}, {\"bbox_2d\": [156, 549, 878, 575], \"label\": \"Text\"}, {\"bbox_2d\": [156, 623, 878, 664], \"label\": \"Text\"}, {\"bbox_2d\": [171, 672, 845, 684], \"label\": \"Text\"}, {\"bbox_2d\": [156, 692, 878, 719], \"label\": \"Text\"}, {\"bbox_2d\": [171, 727, 540, 739], \"label\": \"Text\"}, {\"bbox_2d\": [156, 747, 878, 774], \"label\": \"Text\"}, {\"bbox_2d\": [156, 821, 878, 885], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 887, 878, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 906, 878, 930], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 599, 738, 610], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 451, 555, 462], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [324, 102, 706, 114], \"label\": \"Page-header\"}, {\"bbox_2d\": [442, 160, 601, 178], \"label\": \"Title\"}, {\"bbox_2d\": [275, 202, 768, 220], \"label\": \"Title\"}, {\"bbox_2d\": [175, 244, 867, 285], \"label\": \"Text\"}, {\"bbox_2d\": [156, 387, 884, 428], \"label\": \"Text\"}, {\"bbox_2d\": [156, 475, 884, 502], \"label\": \"Text\"}, {\"bbox_2d\": [156, 550, 884, 576], \"label\": \"Text\"}, {\"bbox_2d\": [171, 530, 561, 542], \"label\": \"Text\"}, {\"bbox_2d\": [171, 510, 591, 522], \"label\": \"Text\"}, {\"bbox_2d\": [156, 451, 561, 463], \"label\": \"Section-header\"}, {\"bbox_2d\": [288, 324, 756, 363], \"label\": \"Table\"}, {\"bbox_2d\": [156, 599, 742, 611], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 624, 884, 665], \"label\": \"Text\"}, {\"bbox_2d\": [171, 673, 850, 685], \"label\": \"Text\"}, {\"bbox_2d\": [156, 693, 884, 720], \"label\": \"Text\"}, {\"bbox_2d\": [171, 728, 544, 739], \"label\": \"Text\"}, {\"bbox_2d\": [156, 747, 884, 774], \"label\": \"Text\"}, {\"bbox_2d\": [156, 821, 883, 886], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 888, 883, 908], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 910, 883, 930], \"label\": \"Footnote\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [547, 43, 926, 55], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 43, 210, 55], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 104, 926, 742], \"label\": \"Picture\"}, {\"bbox_2d\": [282, 747, 483, 758], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 691, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 45, 210, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [280, 747, 487, 758], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 693, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [276, 99, 928, 745], \"label\": \"Picture\"}, {\"bbox_2d\": [546, 46, 926, 57], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 97, 331, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 142, 911, 757], \"label\": \"Text\"}, {\"bbox_2d\": [121, 771, 911, 949], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 97, 331, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 141, 911, 759], \"label\": \"Text\"}, {\"bbox_2d\": [121, 770, 911, 950], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 48, 567, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [129, 71, 609, 84], \"label\": \"Text\"}, {\"bbox_2d\": [749, 987, 975, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [120, 106, 977, 577], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [130, 45, 568, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 70, 610, 85], \"label\": \"Text\"}, {\"bbox_2d\": [119, 106, 975, 392], \"label\": \"Table\"}, {\"bbox_2d\": [121, 408, 231, 428], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 439, 976, 576], \"label\": \"Table\"}, {\"bbox_2d\": [749, 987, 976, 997], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 75, 945, 121], \"label\": \"Text\"}, {\"bbox_2d\": [91, 139, 945, 403], \"label\": \"Text\"}, {\"bbox_2d\": [91, 491, 945, 819], \"label\": \"Text\"}, {\"bbox_2d\": [91, 834, 945, 943], \"label\": \"Text\"}, {\"bbox_2d\": [508, 994, 527, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 448, 455, 466], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 75, 944, 120], \"label\": \"Text\"}, {\"bbox_2d\": [91, 137, 944, 402], \"label\": \"Text\"}, {\"bbox_2d\": [91, 490, 944, 816], \"label\": \"Text\"}, {\"bbox_2d\": [91, 834, 944, 941], \"label\": \"Text\"}, {\"bbox_2d\": [508, 993, 527, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 446, 455, 463], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [94, 53, 298, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [267, 401, 913, 820], \"label\": \"Picture\"}, {\"bbox_2d\": [404, 848, 647, 867], \"label\": \"Caption\"}, {\"bbox_2d\": [510, 981, 538, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [141, 124, 938, 331], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [94, 53, 316, 84], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 107, 955, 351], \"label\": \"Picture\"}, {\"bbox_2d\": [184, 394, 917, 818], \"label\": \"Picture\"}, {\"bbox_2d\": [476, 818, 619, 833], \"label\": \"Text\"}, {\"bbox_2d\": [407, 851, 647, 866], \"label\": \"Caption\"}, {\"bbox_2d\": [413, 374, 651, 388], \"label\": \"Caption\"}, {\"bbox_2d\": [512, 981, 544, 996], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 155, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [220, 38, 814, 47], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 81, 1009, 138], \"label\": \"Text\"}, {\"bbox_2d\": [43, 169, 500, 176], \"label\": \"Text\"}, {\"bbox_2d\": [25, 183, 1009, 238], \"label\": \"Text\"}, {\"bbox_2d\": [25, 267, 1009, 323], \"label\": \"Text\"}, {\"bbox_2d\": [25, 352, 1009, 373], \"label\": \"Text\"}, {\"bbox_2d\": [25, 402, 1009, 423], \"label\": \"Text\"}, {\"bbox_2d\": [25, 452, 1009, 509], \"label\": \"Text\"}, {\"bbox_2d\": [25, 588, 1009, 609], \"label\": \"Text\"}, {\"bbox_2d\": [25, 619, 1009, 663], \"label\": \"Text\"}, {\"bbox_2d\": [25, 438, 326, 449], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 335, 396, 346], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 388, 387, 399], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 250, 244, 261], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 150, 326, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 113, 1009, 138], \"label\": \"Text\"}, {\"bbox_2d\": [25, 27, 155, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 520, 1009, 580], \"label\": \"Table\"}, {\"bbox_2d\": [25, 605, 1009, 635], \"label\": \"Text\"}, {\"bbox_2d\": [25, 619, 1009, 663], \"label\": \"Text\"}, {\"bbox_2d\": [504, 688, 529, 699], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 155, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [219, 37, 815, 48], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 150, 325, 160], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 250, 245, 261], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 335, 397, 346], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 387, 387, 398], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 438, 327, 449], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 80, 1009, 136], \"label\": \"Text\"}, {\"bbox_2d\": [25, 181, 1009, 237], \"label\": \"Text\"}, {\"bbox_2d\": [25, 266, 1009, 322], \"label\": \"Text\"}, {\"bbox_2d\": [25, 351, 1009, 373], \"label\": \"Text\"}, {\"bbox_2d\": [25, 403, 1009, 425], \"label\": \"Text\"}, {\"bbox_2d\": [25, 454, 1009, 510], \"label\": \"Text\"}, {\"bbox_2d\": [25, 587, 1009, 609], \"label\": \"Text\"}, {\"bbox_2d\": [25, 618, 1009, 663], \"label\": \"Text\"}, {\"bbox_2d\": [504, 686, 529, 697], \"label\": \"Page-footer\"}, {\"bbox_2d\": [43, 165, 499, 176], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 520, 1008, 578], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [76, 51, 471, 71], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 83, 158, 92], \"label\": \"Text\"}, {\"bbox_2d\": [85, 121, 951, 819], \"label\": \"Table\"}, {\"bbox_2d\": [85, 834, 316, 845], \"label\": \"Text\"}, {\"bbox_2d\": [76, 999, 420, 1014], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [75, 48, 471, 72], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 80, 158, 91], \"label\": \"Text\"}, {\"bbox_2d\": [82, 119, 956, 819], \"label\": \"Table\"}, {\"bbox_2d\": [84, 831, 319, 845], \"label\": \"Text\"}, {\"bbox_2d\": [74, 999, 420, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [742, 36, 961, 46], \"label\": \"Page-header\"}, {\"bbox_2d\": [745, 70, 961, 989], \"label\": \"Text\"}, {\"bbox_2d\": [98, 69, 169, 989], \"label\": \"Text\"}, {\"bbox_2d\": [535, 69, 607, 989], \"label\": \"Text\"}, {\"bbox_2d\": [423, 69, 494, 989], \"label\": \"Text\"}, {\"bbox_2d\": [211, 69, 284, 989], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [814, 35, 988, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 71, 210, 973], \"label\": \"Text\"}, {\"bbox_2d\": [216, 71, 302, 973], \"label\": \"Text\"}, {\"bbox_2d\": [322, 71, 408, 973], \"label\": \"Text\"}, {\"bbox_2d\": [428, 71, 509, 973], \"label\": \"Text\"}, {\"bbox_2d\": [535, 71, 626, 973], \"label\": \"Text\"}, {\"bbox_2d\": [641, 71, 731, 973], \"label\": \"Text\"}, {\"bbox_2d\": [747, 71, 826, 973], \"label\": \"Text\"}, {\"bbox_2d\": [854, 71, 936, 973], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 102, 544, 123], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 148, 243, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 176, 966, 439], \"label\": \"Table\"}, {\"bbox_2d\": [121, 446, 947, 463], \"label\": \"Text\"}, {\"bbox_2d\": [121, 466, 941, 485], \"label\": \"Text\"}, {\"bbox_2d\": [158, 488, 961, 507], \"label\": \"Text\"}, {\"bbox_2d\": [862, 1002, 973, 1015], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [154, 106, 544, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 148, 243, 160], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 174, 966, 442], \"label\": \"Table\"}, {\"bbox_2d\": [120, 446, 952, 465], \"label\": \"Text\"}, {\"bbox_2d\": [158, 467, 942, 486], \"label\": \"Text\"}, {\"bbox_2d\": [158, 488, 963, 507], \"label\": \"Text\"}, {\"bbox_2d\": [863, 1001, 977, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 84, 688, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 413, 894, 443], \"label\": \"Text\"}, {\"bbox_2d\": [293, 460, 790, 475], \"label\": \"Text\"}, {\"bbox_2d\": [293, 491, 608, 505], \"label\": \"Text\"}, {\"bbox_2d\": [293, 522, 828, 536], \"label\": \"Text\"}, {\"bbox_2d\": [218, 554, 513, 573], \"label\": \"Section-header\"}, {\"bbox_2d\": [317, 587, 845, 715], \"label\": \"Text\"}, {\"bbox_2d\": [317, 833, 539, 843], \"label\": \"Text\"}, {\"bbox_2d\": [797, 820, 844, 830], \"label\": \"Text\"}, {\"bbox_2d\": [293, 95, 932, 386], \"label\": \"Table\"}, {\"bbox_2d\": [644, 979, 932, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [293, 86, 690, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 98, 930, 387], \"label\": \"Table\"}, {\"bbox_2d\": [293, 415, 894, 444], \"label\": \"Text\"}, {\"bbox_2d\": [293, 462, 789, 475], \"label\": \"Text\"}, {\"bbox_2d\": [293, 493, 608, 507], \"label\": \"Text\"}, {\"bbox_2d\": [293, 525, 829, 538], \"label\": \"Text\"}, {\"bbox_2d\": [217, 556, 513, 573], \"label\": \"Section-header\"}, {\"bbox_2d\": [493, 586, 662, 597], \"label\": \"Section-header\"}, {\"bbox_2d\": [317, 609, 844, 844], \"label\": \"Text\"}, {\"bbox_2d\": [644, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [438, 73, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [298, 94, 334, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [131, 113, 499, 132], \"label\": \"Text\"}, {\"bbox_2d\": [131, 140, 500, 188], \"label\": \"Text\"}, {\"bbox_2d\": [130, 194, 500, 324], \"label\": \"Text\"}, {\"bbox_2d\": [130, 331, 500, 520], \"label\": \"Text\"}, {\"bbox_2d\": [130, 517, 500, 562], \"label\": \"Text\"}, {\"bbox_2d\": [130, 569, 501, 788], \"label\": \"Text\"}, {\"bbox_2d\": [130, 795, 500, 953], \"label\": \"Text\"}, {\"bbox_2d\": [130, 960, 500, 980], \"label\": \"Text\"}, {\"bbox_2d\": [526, 924, 894, 980], \"label\": \"Text\"}, {\"bbox_2d\": [526, 785, 894, 915], \"label\": \"Text\"}, {\"bbox_2d\": [526, 352, 894, 377], \"label\": \"Text\"}, {\"bbox_2d\": [526, 247, 894, 282], \"label\": \"Text\"}, {\"bbox_2d\": [526, 153, 894, 203], \"label\": \"Text\"}, {\"bbox_2d\": [526, 113, 894, 147], \"label\": \"Text\"}, {\"bbox_2d\": [681, 219, 739, 228], \"label\": \"Section-header\"}, {\"bbox_2d\": [645, 298, 778, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [638, 325, 783, 334], \"label\": \"Section-header\"}, {\"bbox_2d\": [519, 405, 894, 754], \"label\": \"Picture\"}, {\"bbox_2d\": [572, 597, 582, 605], \"label\": \"Text\"}, {\"bbox_2d\": [569, 609, 580, 634], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [298, 93, 334, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [438, 73, 587, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [693, 93, 727, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 111, 500, 134], \"label\": \"Text\"}, {\"bbox_2d\": [130, 138, 500, 189], \"label\": \"Text\"}, {\"bbox_2d\": [130, 193, 501, 326], \"label\": \"Text\"}, {\"bbox_2d\": [130, 329, 518, 512], \"label\": \"Text\"}, {\"bbox_2d\": [130, 515, 501, 564], \"label\": \"Text\"}, {\"bbox_2d\": [130, 567, 520, 786], \"label\": \"Text\"}, {\"bbox_2d\": [131, 789, 167, 799], \"label\": \"Text\"}, {\"bbox_2d\": [130, 802, 519, 956], \"label\": \"Text\"}, {\"bbox_2d\": [130, 959, 499, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 111, 894, 148], \"label\": \"Text\"}, {\"bbox_2d\": [526, 152, 896, 203], \"label\": \"Text\"}, {\"bbox_2d\": [681, 218, 740, 228], \"label\": \"Text\"}, {\"bbox_2d\": [526, 246, 895, 283], \"label\": \"Text\"}, {\"bbox_2d\": [526, 353, 894, 377], \"label\": \"Text\"}, {\"bbox_2d\": [526, 781, 898, 915], \"label\": \"Text\"}, {\"bbox_2d\": [526, 918, 898, 982], \"label\": \"Text\"}, {\"bbox_2d\": [524, 405, 825, 605], \"label\": \"Picture\"}, {\"bbox_2d\": [545, 607, 890, 759], \"label\": \"Picture\"}, {\"bbox_2d\": [644, 298, 777, 308], \"label\": \"Text\"}, {\"bbox_2d\": [638, 326, 781, 336], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [24, 25, 164, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [26, 111, 223, 123], \"label\": \"Section-header\"}, {\"bbox_2d\": [211, 144, 814, 284], \"label\": \"Picture\"}, {\"bbox_2d\": [25, 309, 901, 319], \"label\": \"Text\"}, {\"bbox_2d\": [25, 338, 1009, 359], \"label\": \"Text\"}, {\"bbox_2d\": [25, 372, 1009, 392], \"label\": \"Text\"}, {\"bbox_2d\": [25, 405, 1009, 447], \"label\": \"Text\"}, {\"bbox_2d\": [25, 665, 1009, 692], \"label\": \"Text\"}, {\"bbox_2d\": [26, 457, 142, 469], \"label\": \"Section-header\"}, {\"bbox_2d\": [376, 489, 653, 631], \"label\": \"Picture\"}, {\"bbox_2d\": [25, 688, 339, 706], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 681, 582, 694], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [25, 24, 164, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 113, 223, 124], \"label\": \"Section-header\"}, {\"bbox_2d\": [211, 140, 488, 283], \"label\": \"Picture\"}, {\"bbox_2d\": [547, 144, 821, 286], \"label\": \"Picture\"}, {\"bbox_2d\": [25, 311, 902, 319], \"label\": \"Text\"}, {\"bbox_2d\": [25, 339, 1009, 359], \"label\": \"Text\"}, {\"bbox_2d\": [25, 372, 1009, 392], \"label\": \"Text\"}, {\"bbox_2d\": [25, 406, 1009, 445], \"label\": \"Text\"}, {\"bbox_2d\": [25, 458, 141, 469], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 665, 1009, 685], \"label\": \"Text\"}, {\"bbox_2d\": [25, 685, 582, 695], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 695, 338, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [378, 488, 654, 627], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [101, 87, 408, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 117, 895, 146], \"label\": \"Text\"}, {\"bbox_2d\": [274, 151, 874, 165], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 170, 390, 184], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 202, 510, 216], \"label\": \"Text\"}, {\"bbox_2d\": [272, 221, 911, 530], \"label\": \"Picture\"}, {\"bbox_2d\": [274, 557, 917, 603], \"label\": \"Text\"}, {\"bbox_2d\": [272, 618, 913, 947], \"label\": \"Table\"}, {\"bbox_2d\": [274, 613, 611, 626], \"label\": \"Caption\"}, {\"bbox_2d\": [101, 979, 385, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 89, 408, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [271, 627, 912, 950], \"label\": \"Table\"}, {\"bbox_2d\": [272, 615, 612, 626], \"label\": \"Caption\"}, {\"bbox_2d\": [273, 559, 911, 603], \"label\": \"Text\"}, {\"bbox_2d\": [273, 204, 509, 217], \"label\": \"Text\"}, {\"bbox_2d\": [273, 118, 889, 146], \"label\": \"Text\"}, {\"bbox_2d\": [273, 151, 869, 166], \"label\": \"Text\"}, {\"bbox_2d\": [273, 172, 388, 186], \"label\": \"Text\"}, {\"bbox_2d\": [479, 228, 641, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [297, 251, 567, 263], \"label\": \"Text\"}, {\"bbox_2d\": [310, 263, 574, 275], \"label\": \"Text\"}, {\"bbox_2d\": [297, 275, 662, 287], \"label\": \"Text\"}, {\"bbox_2d\": [297, 310, 472, 322], \"label\": \"Text\"}, {\"bbox_2d\": [310, 334, 648, 345], \"label\": \"Text\"}, {\"bbox_2d\": [324, 345, 709, 357], \"label\": \"Text\"}, {\"bbox_2d\": [310, 357, 662, 369], \"label\": \"Text\"}, {\"bbox_2d\": [297, 498, 695, 510], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 25, 164, 33], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 116, 379, 127], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 396, 644, 406], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 617, 1009, 633], \"label\": \"Text\"}, {\"bbox_2d\": [25, 144, 1012, 345], \"label\": \"Table\"}, {\"bbox_2d\": [25, 349, 833, 359], \"label\": \"Text\"}, {\"bbox_2d\": [25, 361, 1009, 378], \"label\": \"Text\"}, {\"bbox_2d\": [24, 421, 1007, 612], \"label\": \"Table\"}, {\"bbox_2d\": [25, 624, 1009, 633], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [25, 24, 164, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 116, 380, 127], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 396, 644, 406], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 350, 947, 358], \"label\": \"Text\"}, {\"bbox_2d\": [25, 359, 1009, 376], \"label\": \"Text\"}, {\"bbox_2d\": [25, 617, 1009, 634], \"label\": \"Text\"}, {\"bbox_2d\": [24, 422, 1005, 611], \"label\": \"Table\"}, {\"bbox_2d\": [23, 142, 1007, 345], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [142, 92, 889, 146], \"label\": \"Title\"}, {\"bbox_2d\": [135, 172, 897, 189], \"label\": \"Text\"}, {\"bbox_2d\": [33, 257, 61, 711], \"label\": \"Page-header\"}, {\"bbox_2d\": [339, 204, 697, 217], \"label\": \"Text\"}, {\"bbox_2d\": [321, 233, 715, 245], \"label\": \"Text\"}, {\"bbox_2d\": [260, 261, 771, 331], \"label\": \"Text\"}, {\"bbox_2d\": [355, 347, 677, 360], \"label\": \"Text\"}, {\"bbox_2d\": [319, 376, 718, 388], \"label\": \"Text\"}, {\"bbox_2d\": [121, 467, 911, 650], \"label\": \"Text\"}, {\"bbox_2d\": [121, 679, 457, 691], \"label\": \"Text\"}, {\"bbox_2d\": [397, 405, 630, 417], \"label\": \"Text\"}, {\"bbox_2d\": [473, 437, 561, 453], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 952, 479, 964], \"label\": \"Footnote\"}, {\"bbox_2d\": [501, 989, 511, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [143, 94, 889, 146], \"label\": \"Title\"}, {\"bbox_2d\": [135, 172, 895, 189], \"label\": \"Text\"}, {\"bbox_2d\": [319, 201, 713, 246], \"label\": \"Text\"}, {\"bbox_2d\": [263, 258, 769, 332], \"label\": \"Text\"}, {\"bbox_2d\": [316, 344, 716, 389], \"label\": \"Text\"}, {\"bbox_2d\": [401, 404, 631, 418], \"label\": \"Text\"}, {\"bbox_2d\": [471, 435, 562, 454], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 467, 911, 651], \"label\": \"Text\"}, {\"bbox_2d\": [121, 680, 457, 693], \"label\": \"Text\"}, {\"bbox_2d\": [121, 951, 477, 964], \"label\": \"Footnote\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Text\"}, {\"bbox_2d\": [31, 258, 61, 712], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [71, 38, 361, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [71, 78, 480, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [69, 131, 484, 225], \"label\": \"Text\"}, {\"bbox_2d\": [69, 271, 484, 702], \"label\": \"Text\"}, {\"bbox_2d\": [523, 305, 943, 534], \"label\": \"Text\"}, {\"bbox_2d\": [523, 532, 943, 842], \"label\": \"Text\"}, {\"bbox_2d\": [69, 786, 483, 864], \"label\": \"Text\"}, {\"bbox_2d\": [523, 839, 943, 961], \"label\": \"Text\"}, {\"bbox_2d\": [69, 862, 483, 959], \"label\": \"Text\"}, {\"bbox_2d\": [487, 968, 523, 986], \"label\": \"Page-footer\"}, {\"bbox_2d\": [69, 240, 266, 262], \"label\": \"Section-header\"}, {\"bbox_2d\": [523, 79, 943, 307], \"label\": \"Text\"}, {\"bbox_2d\": [71, 700, 484, 777], \"label\": \"Table\"}, {\"bbox_2d\": [95, 700, 484, 777], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [72, 39, 373, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [71, 82, 486, 118], \"label\": \"Section-header\"}, {\"bbox_2d\": [70, 244, 263, 260], \"label\": \"Section-header\"}, {\"bbox_2d\": [523, 81, 950, 306], \"label\": \"Text\"}, {\"bbox_2d\": [70, 129, 491, 223], \"label\": \"Text\"}, {\"bbox_2d\": [68, 271, 495, 702], \"label\": \"Text\"}, {\"bbox_2d\": [524, 307, 951, 533], \"label\": \"Text\"}, {\"bbox_2d\": [523, 534, 947, 840], \"label\": \"Text\"}, {\"bbox_2d\": [68, 789, 491, 862], \"label\": \"Text\"}, {\"bbox_2d\": [68, 702, 502, 777], \"label\": \"Table\"}, {\"bbox_2d\": [523, 842, 948, 960], \"label\": \"Text\"}, {\"bbox_2d\": [69, 865, 492, 958], \"label\": \"Text\"}, {\"bbox_2d\": [488, 969, 525, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 122, 736, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 418, 705, 437], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 529, 603, 548], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 164, 931, 193], \"label\": \"Text\"}, {\"bbox_2d\": [293, 368, 896, 397], \"label\": \"Text\"}, {\"bbox_2d\": [293, 448, 905, 477], \"label\": \"Text\"}, {\"bbox_2d\": [293, 495, 624, 509], \"label\": \"Text\"}, {\"bbox_2d\": [293, 560, 877, 588], \"label\": \"Text\"}, {\"bbox_2d\": [905, 977, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 200, 388, 214], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 220, 391, 233], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 239, 373, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 258, 410, 272], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 278, 426, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 297, 427, 311], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 317, 417, 330], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 336, 426, 350], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 735, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 420, 706, 437], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 531, 603, 548], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 496, 624, 509], \"label\": \"Text\"}, {\"bbox_2d\": [293, 449, 905, 478], \"label\": \"Text\"}, {\"bbox_2d\": [293, 560, 877, 589], \"label\": \"Text\"}, {\"bbox_2d\": [293, 369, 897, 398], \"label\": \"Text\"}, {\"bbox_2d\": [293, 166, 930, 195], \"label\": \"Text\"}, {\"bbox_2d\": [293, 199, 388, 214], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 218, 391, 234], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 238, 375, 253], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 258, 411, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 277, 425, 293], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 297, 425, 312], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 316, 417, 332], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 336, 425, 351], \"label\": \"List-item\"}, {\"bbox_2d\": [904, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [274, 86, 911, 921], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 84, 914, 925], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [273, 83, 442, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 109, 483, 122], \"label\": \"Caption\"}, {\"bbox_2d\": [273, 123, 351, 137], \"label\": \"Text\"}, {\"bbox_2d\": [459, 123, 906, 151], \"label\": \"Text\"}, {\"bbox_2d\": [459, 237, 893, 279], \"label\": \"Text\"}, {\"bbox_2d\": [459, 522, 894, 607], \"label\": \"Text\"}, {\"bbox_2d\": [273, 878, 894, 908], \"label\": \"Text\"}, {\"bbox_2d\": [273, 522, 417, 536], \"label\": \"Text\"}, {\"bbox_2d\": [273, 237, 327, 251], \"label\": \"Text\"}, {\"bbox_2d\": [459, 286, 739, 515], \"label\": \"Picture\"}, {\"bbox_2d\": [479, 157, 729, 228], \"label\": \"Picture\"}, {\"bbox_2d\": [479, 614, 744, 842], \"label\": \"Picture\"}, {\"bbox_2d\": [273, 979, 378, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [99, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 441, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [272, 110, 485, 121], \"label\": \"Caption\"}, {\"bbox_2d\": [271, 125, 907, 848], \"label\": \"Table\"}, {\"bbox_2d\": [273, 880, 894, 909], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [237, 133, 906, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [1015, 139, 1026, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [512, 234, 903, 337], \"label\": \"Text\"}, {\"bbox_2d\": [88, 257, 479, 384], \"label\": \"Text\"}, {\"bbox_2d\": [512, 391, 901, 806], \"label\": \"Text\"}, {\"bbox_2d\": [88, 414, 476, 606], \"label\": \"Text\"}, {\"bbox_2d\": [88, 637, 470, 786], \"label\": \"Text\"}, {\"bbox_2d\": [512, 839, 894, 897], \"label\": \"Text\"}, {\"bbox_2d\": [88, 817, 476, 899], \"label\": \"Text\"}, {\"bbox_2d\": [512, 368, 612, 381], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 234, 180, 248], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [1016, 137, 1028, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [237, 126, 907, 158], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 231, 181, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [512, 365, 612, 382], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 253, 479, 383], \"label\": \"Text\"}, {\"bbox_2d\": [87, 409, 478, 606], \"label\": \"Text\"}, {\"bbox_2d\": [87, 633, 474, 786], \"label\": \"Text\"}, {\"bbox_2d\": [87, 812, 477, 897], \"label\": \"Text\"}, {\"bbox_2d\": [512, 834, 893, 896], \"label\": \"Text\"}, {\"bbox_2d\": [512, 386, 904, 807], \"label\": \"Text\"}, {\"bbox_2d\": [512, 229, 903, 337], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [98, 301, 493, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 372, 351, 384], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 586, 243, 598], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 681, 304, 693], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 705, 407, 883], \"label\": \"Text\"}, {\"bbox_2d\": [98, 397, 381, 551], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [100, 302, 495, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [100, 373, 352, 385], \"label\": \"Section-header\"}, {\"bbox_2d\": [100, 397, 169, 408], \"label\": \"Section-header\"}, {\"bbox_2d\": [100, 682, 306, 694], \"label\": \"Section-header\"}, {\"bbox_2d\": [100, 587, 243, 598], \"label\": \"Section-header\"}, {\"bbox_2d\": [100, 420, 300, 432], \"label\": \"Text\"}, {\"bbox_2d\": [100, 444, 311, 456], \"label\": \"Text\"}, {\"bbox_2d\": [100, 468, 340, 480], \"label\": \"Text\"}, {\"bbox_2d\": [100, 491, 350, 503], \"label\": \"Text\"}, {\"bbox_2d\": [100, 515, 354, 527], \"label\": \"Text\"}, {\"bbox_2d\": [100, 539, 379, 551], \"label\": \"Text\"}, {\"bbox_2d\": [100, 610, 202, 622], \"label\": \"Text\"}, {\"bbox_2d\": [100, 634, 192, 646], \"label\": \"Text\"}, {\"bbox_2d\": [100, 705, 410, 883], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 34, 411, 89], \"label\": \"Picture\"}, {\"bbox_2d\": [154, 139, 918, 183], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 184, 918, 228], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 246, 917, 318], \"label\": \"Text\"}, {\"bbox_2d\": [121, 372, 917, 431], \"label\": \"Text\"}, {\"bbox_2d\": [121, 449, 594, 462], \"label\": \"Text\"}, {\"bbox_2d\": [154, 482, 917, 539], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 543, 917, 602], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 606, 917, 675], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 678, 917, 736], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 740, 917, 797], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 798, 917, 824], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 843, 917, 871], \"label\": \"Text\"}, {\"bbox_2d\": [121, 890, 917, 933], \"label\": \"Text\"}, {\"bbox_2d\": [533, 971, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [154, 338, 461, 354], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [154, 136, 918, 181], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 181, 918, 226], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 480, 918, 539], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 540, 918, 599], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 599, 918, 674], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 674, 918, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 734, 918, 794], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 794, 918, 824], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 841, 918, 871], \"label\": \"Text\"}, {\"bbox_2d\": [123, 888, 918, 933], \"label\": \"Text\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [123, 448, 591, 462], \"label\": \"Text\"}, {\"bbox_2d\": [123, 370, 918, 430], \"label\": \"Text\"}, {\"bbox_2d\": [123, 243, 918, 318], \"label\": \"Text\"}, {\"bbox_2d\": [186, 335, 465, 353], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 34, 409, 87], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 71, 506, 114], \"label\": \"Text\"}, {\"bbox_2d\": [91, 112, 506, 167], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 192], \"label\": \"Text\"}, {\"bbox_2d\": [536, 190, 951, 387], \"label\": \"Text\"}, {\"bbox_2d\": [536, 442, 951, 524], \"label\": \"Text\"}, {\"bbox_2d\": [91, 394, 506, 648], \"label\": \"Text\"}, {\"bbox_2d\": [91, 647, 506, 800], \"label\": \"Text\"}, {\"bbox_2d\": [91, 800, 506, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 858, 951, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 713, 951, 801], \"label\": \"Text\"}, {\"bbox_2d\": [536, 577, 951, 711], \"label\": \"Text\"}, {\"bbox_2d\": [536, 525, 951, 560], \"label\": \"Formula\"}, {\"bbox_2d\": [674, 500, 951, 550], \"label\": \"Formula\"}, {\"bbox_2d\": [624, 369, 951, 404], \"label\": \"Formula\"}, {\"bbox_2d\": [137, 348, 462, 374], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 199, 420, 209], \"label\": \"Text\"}, {\"bbox_2d\": [91, 290, 506, 317], \"label\": \"Text\"}, {\"bbox_2d\": [198, 159, 506, 188], \"label\": \"Formula\"}, {\"bbox_2d\": [140, 218, 506, 250], \"label\": \"Formula\"}, {\"bbox_2d\": [135, 253, 506, 276], \"label\": \"Formula\"}, {\"bbox_2d\": [135, 279, 506, 301], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 801, 506, 811], \"label\": \"Text\"}, {\"bbox_2d\": [552, 813, 951, 849], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 949, 506, 977], \"label\": \"Text\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 951, 189], \"label\": \"Text\"}, {\"bbox_2d\": [536, 191, 951, 385], \"label\": \"Text\"}, {\"bbox_2d\": [91, 69, 506, 114], \"label\": \"Text\"}, {\"bbox_2d\": [91, 116, 506, 144], \"label\": \"Text\"}, {\"bbox_2d\": [91, 197, 419, 210], \"label\": \"Text\"}, {\"bbox_2d\": [200, 156, 506, 195], \"label\": \"Formula\"}, {\"bbox_2d\": [620, 398, 951, 437], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 289, 506, 317], \"label\": \"Text\"}, {\"bbox_2d\": [136, 348, 461, 373], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 441, 951, 517], \"label\": \"Text\"}, {\"bbox_2d\": [91, 394, 506, 647], \"label\": \"Text\"}, {\"bbox_2d\": [536, 574, 951, 710], \"label\": \"Text\"}, {\"bbox_2d\": [91, 649, 506, 797], \"label\": \"Text\"}, {\"bbox_2d\": [536, 712, 951, 810], \"label\": \"Text\"}, {\"bbox_2d\": [645, 530, 951, 570], \"label\": \"Formula\"}, {\"bbox_2d\": [608, 812, 951, 855], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 858, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [91, 799, 506, 947], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 25, 163, 34], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 113, 1009, 185], \"label\": \"Text\"}, {\"bbox_2d\": [25, 198, 1009, 346], \"label\": \"Text\"}, {\"bbox_2d\": [25, 359, 1009, 399], \"label\": \"Text\"}, {\"bbox_2d\": [25, 412, 1009, 441], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [25, 24, 164, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 114, 1009, 183], \"label\": \"Text\"}, {\"bbox_2d\": [25, 196, 1009, 345], \"label\": \"Text\"}, {\"bbox_2d\": [25, 359, 1009, 399], \"label\": \"Text\"}, {\"bbox_2d\": [25, 412, 1009, 442], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [790, 74, 893, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [497, 94, 528, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 74, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 142, 500, 178], \"label\": \"Text\"}, {\"bbox_2d\": [130, 183, 501, 256], \"label\": \"Text\"}, {\"bbox_2d\": [130, 261, 500, 404], \"label\": \"Text\"}, {\"bbox_2d\": [130, 409, 501, 584], \"label\": \"Text\"}, {\"bbox_2d\": [130, 589, 501, 639], \"label\": \"Text\"}, {\"bbox_2d\": [130, 644, 501, 666], \"label\": \"Text\"}, {\"bbox_2d\": [526, 142, 894, 280], \"label\": \"Text\"}, {\"bbox_2d\": [526, 285, 895, 311], \"label\": \"Text\"}, {\"bbox_2d\": [526, 317, 895, 372], \"label\": \"Text\"}, {\"bbox_2d\": [526, 377, 895, 467], \"label\": \"Text\"}, {\"bbox_2d\": [526, 498, 895, 535], \"label\": \"Text\"}, {\"bbox_2d\": [526, 569, 894, 592], \"label\": \"Text\"}, {\"bbox_2d\": [526, 644, 894, 666], \"label\": \"Text\"}, {\"bbox_2d\": [526, 481, 751, 491], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 549, 586, 559], \"label\": \"Section-header\"}, {\"bbox_2d\": [644, 605, 777, 615], \"label\": \"Section-header\"}, {\"bbox_2d\": [675, 623, 748, 633], \"label\": \"Section-header\"}, {\"bbox_2d\": [317, 727, 862, 939], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [790, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [497, 94, 530, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 142, 500, 178], \"label\": \"Text\"}, {\"bbox_2d\": [131, 181, 500, 231], \"label\": \"Text\"}, {\"bbox_2d\": [130, 234, 499, 284], \"label\": \"Text\"}, {\"bbox_2d\": [130, 287, 501, 402], \"label\": \"Text\"}, {\"bbox_2d\": [130, 405, 501, 455], \"label\": \"Text\"}, {\"bbox_2d\": [130, 458, 501, 586], \"label\": \"Text\"}, {\"bbox_2d\": [131, 590, 501, 639], \"label\": \"Text\"}, {\"bbox_2d\": [131, 642, 495, 666], \"label\": \"Text\"}, {\"bbox_2d\": [526, 142, 896, 282], \"label\": \"Text\"}, {\"bbox_2d\": [526, 285, 895, 308], \"label\": \"Text\"}, {\"bbox_2d\": [526, 311, 894, 348], \"label\": \"Text\"}, {\"bbox_2d\": [525, 351, 895, 465], \"label\": \"Text\"}, {\"bbox_2d\": [526, 499, 895, 535], \"label\": \"Text\"}, {\"bbox_2d\": [526, 568, 890, 591], \"label\": \"Text\"}, {\"bbox_2d\": [668, 479, 752, 489], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 549, 582, 559], \"label\": \"Section-header\"}, {\"bbox_2d\": [675, 624, 745, 634], \"label\": \"Section-header\"}, {\"bbox_2d\": [645, 605, 776, 615], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 642, 890, 666], \"label\": \"Text\"}, {\"bbox_2d\": [322, 702, 862, 934], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [131, 98, 754, 110], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 126, 571, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 154, 577, 166], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 182, 588, 194], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 210, 540, 222], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 238, 736, 250], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 266, 898, 278], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 294, 484, 306], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 322, 580, 334], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 350, 554, 362], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 378, 554, 390], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 406, 433, 418], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 434, 560, 446], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 462, 467, 474], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 491, 588, 503], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 519, 521, 531], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 547, 579, 560], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 576, 703, 588], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 604, 769, 616], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 632, 591, 644], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 660, 617, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 690, 484, 702], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 718, 528, 730], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 746, 542, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 774, 511, 786], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 802, 511, 814], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 830, 588, 842], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 858, 659, 875], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 891, 713, 903], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 919, 669, 931], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 947, 686, 959], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [131, 96, 755, 110], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 124, 572, 139], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 153, 577, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 181, 587, 195], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 209, 539, 224], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 238, 736, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 266, 897, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 294, 484, 309], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 323, 579, 337], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 351, 553, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 379, 554, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 408, 433, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 436, 559, 450], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 464, 466, 479], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 493, 586, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 521, 521, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 549, 579, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 578, 701, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 606, 769, 620], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 635, 590, 649], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 663, 617, 677], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 691, 485, 706], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 720, 528, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 748, 542, 762], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 776, 512, 791], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 805, 512, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 833, 587, 847], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 861, 547, 876], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 890, 601, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 918, 557, 932], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 946, 686, 961], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [691, 39, 928, 51], \"label\": \"Page-header\"}, {\"bbox_2d\": [112, 178, 325, 262], \"label\": \"Picture\"}, {\"bbox_2d\": [361, 140, 931, 309], \"label\": \"Picture\"}, {\"bbox_2d\": [133, 364, 845, 430], \"label\": \"Section-header\"}, {\"bbox_2d\": [555, 454, 940, 500], \"label\": \"Text\"}, {\"bbox_2d\": [555, 519, 936, 602], \"label\": \"Text\"}, {\"bbox_2d\": [133, 454, 514, 624], \"label\": \"Text\"}, {\"bbox_2d\": [555, 618, 946, 719], \"label\": \"Text\"}, {\"bbox_2d\": [133, 642, 508, 743], \"label\": \"Text\"}, {\"bbox_2d\": [133, 760, 510, 825], \"label\": \"Text\"}, {\"bbox_2d\": [133, 843, 510, 925], \"label\": \"Text\"}, {\"bbox_2d\": [756, 775, 965, 912], \"label\": \"Caption\"}, {\"bbox_2d\": [555, 747, 731, 922], \"label\": \"Picture\"}, {\"bbox_2d\": [755, 999, 928, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [691, 36, 929, 51], \"label\": \"Page-header\"}, {\"bbox_2d\": [111, 176, 329, 270], \"label\": \"Section-header\"}, {\"bbox_2d\": [361, 140, 1034, 310], \"label\": \"Picture\"}, {\"bbox_2d\": [133, 358, 854, 432], \"label\": \"Section-header\"}, {\"bbox_2d\": [133, 452, 518, 625], \"label\": \"Text\"}, {\"bbox_2d\": [133, 640, 509, 742], \"label\": \"Text\"}, {\"bbox_2d\": [133, 758, 514, 825], \"label\": \"Text\"}, {\"bbox_2d\": [133, 840, 520, 925], \"label\": \"Text\"}, {\"bbox_2d\": [555, 452, 944, 501], \"label\": \"Text\"}, {\"bbox_2d\": [555, 516, 938, 601], \"label\": \"Text\"}, {\"bbox_2d\": [555, 617, 950, 719], \"label\": \"Text\"}, {\"bbox_2d\": [554, 762, 726, 922], \"label\": \"Picture\"}, {\"bbox_2d\": [757, 773, 971, 913], \"label\": \"Caption\"}, {\"bbox_2d\": [755, 1001, 929, 1009], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [98, 83, 914, 960], \"label\": \"Table\"}, {\"bbox_2d\": [99, 980, 367, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [100, 84, 930, 961], \"label\": \"Table\"}, {\"bbox_2d\": [101, 978, 368, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [893, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [192, 95, 842, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 141, 911, 293], \"label\": \"Text\"}, {\"bbox_2d\": [147, 305, 419, 320], \"label\": \"Text\"}, {\"bbox_2d\": [121, 392, 911, 516], \"label\": \"Text\"}, {\"bbox_2d\": [121, 584, 911, 653], \"label\": \"Text\"}, {\"bbox_2d\": [121, 776, 911, 875], \"label\": \"Text\"}, {\"bbox_2d\": [327, 346, 911, 373], \"label\": \"Formula\"}, {\"bbox_2d\": [408, 514, 911, 570], \"label\": \"Formula\"}, {\"bbox_2d\": [397, 670, 911, 759], \"label\": \"Formula\"}, {\"bbox_2d\": [409, 886, 911, 954], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [891, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [190, 96, 841, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 141, 911, 293], \"label\": \"Text\"}, {\"bbox_2d\": [147, 305, 419, 321], \"label\": \"Text\"}, {\"bbox_2d\": [121, 387, 911, 516], \"label\": \"Text\"}, {\"bbox_2d\": [121, 556, 911, 652], \"label\": \"Text\"}, {\"bbox_2d\": [121, 775, 911, 873], \"label\": \"Text\"}, {\"bbox_2d\": [329, 344, 911, 375], \"label\": \"Formula\"}, {\"bbox_2d\": [409, 514, 911, 575], \"label\": \"Formula\"}, {\"bbox_2d\": [400, 670, 911, 788], \"label\": \"Formula\"}, {\"bbox_2d\": [412, 890, 911, 979], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [0, 0, 1023, 1035], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [1, -1, 1023, 1034], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [101, 89, 419, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 119, 893, 146], \"label\": \"Text\"}, {\"bbox_2d\": [274, 154, 917, 181], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 189, 389, 201], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 545, 905, 586], \"label\": \"Text\"}, {\"bbox_2d\": [274, 607, 888, 618], \"label\": \"Text\"}, {\"bbox_2d\": [274, 631, 614, 641], \"label\": \"Caption\"}, {\"bbox_2d\": [274, 643, 914, 891], \"label\": \"Table\"}, {\"bbox_2d\": [101, 981, 375, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [274, 209, 699, 494], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [101, 89, 418, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 118, 890, 146], \"label\": \"Text\"}, {\"bbox_2d\": [273, 151, 911, 182], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 188, 388, 201], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 543, 898, 588], \"label\": \"Text\"}, {\"bbox_2d\": [273, 606, 882, 619], \"label\": \"Text\"}, {\"bbox_2d\": [272, 631, 611, 641], \"label\": \"Caption\"}, {\"bbox_2d\": [268, 643, 914, 894], \"label\": \"Table\"}, {\"bbox_2d\": [472, 212, 635, 224], \"label\": \"Section-header\"}, {\"bbox_2d\": [296, 235, 695, 272], \"label\": \"Table\"}, {\"bbox_2d\": [309, 316, 723, 355], \"label\": \"Table\"}, {\"bbox_2d\": [297, 295, 472, 306], \"label\": \"Text\"}, {\"bbox_2d\": [297, 482, 344, 494], \"label\": \"Text\"}, {\"bbox_2d\": [364, 482, 432, 494], \"label\": \"Text\"}, {\"bbox_2d\": [452, 482, 520, 494], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 370, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 34, 411, 91], \"label\": \"Picture\"}, {\"bbox_2d\": [128, 138, 613, 154], \"label\": \"List-item\"}, {\"bbox_2d\": [128, 165, 903, 194], \"label\": \"List-item\"}, {\"bbox_2d\": [128, 213, 905, 256], \"label\": \"Text\"}, {\"bbox_2d\": [128, 275, 903, 364], \"label\": \"Text\"}, {\"bbox_2d\": [128, 383, 903, 411], \"label\": \"Text\"}, {\"bbox_2d\": [128, 541, 903, 629], \"label\": \"Text\"}, {\"bbox_2d\": [128, 648, 903, 677], \"label\": \"Text\"}, {\"bbox_2d\": [128, 702, 903, 783], \"label\": \"Text\"}, {\"bbox_2d\": [128, 802, 903, 831], \"label\": \"Text\"}, {\"bbox_2d\": [128, 929, 686, 943], \"label\": \"Footnote\"}, {\"bbox_2d\": [538, 970, 546, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [128, 498, 520, 514], \"label\": \"List-item\"}, {\"bbox_2d\": [128, 457, 903, 486], \"label\": \"List-item\"}, {\"bbox_2d\": [128, 429, 400, 445], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [133, 137, 610, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 164, 908, 194], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 455, 908, 485], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 497, 518, 512], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 211, 909, 256], \"label\": \"Text\"}, {\"bbox_2d\": [133, 273, 909, 363], \"label\": \"Text\"}, {\"bbox_2d\": [133, 380, 908, 410], \"label\": \"Text\"}, {\"bbox_2d\": [133, 539, 908, 629], \"label\": \"Text\"}, {\"bbox_2d\": [133, 646, 908, 676], \"label\": \"Text\"}, {\"bbox_2d\": [133, 708, 908, 783], \"label\": \"Text\"}, {\"bbox_2d\": [133, 802, 908, 832], \"label\": \"Text\"}, {\"bbox_2d\": [123, 928, 687, 942], \"label\": \"Footnote\"}, {\"bbox_2d\": [537, 970, 547, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [133, 428, 401, 443], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 34, 409, 87], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 31, 410, 89], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 912, 916, 941], \"label\": \"Footnote\"}, {\"bbox_2d\": [533, 968, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [288, 206, 906, 250], \"label\": \"List-item\"}, {\"bbox_2d\": [350, 139, 906, 197], \"label\": \"List-item\"}, {\"bbox_2d\": [350, 259, 906, 288], \"label\": \"List-item\"}, {\"bbox_2d\": [350, 296, 906, 325], \"label\": \"List-item\"}, {\"bbox_2d\": [350, 333, 906, 391], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 439, 895, 482], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 490, 876, 533], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 541, 827, 570], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 578, 874, 607], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 615, 878, 644], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 652, 895, 681], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 690, 841, 719], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 727, 879, 756], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 764, 854, 794], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 803, 894, 832], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [123, 914, 915, 942], \"label\": \"Footnote\"}, {\"bbox_2d\": [122, 136, 913, 840], \"label\": \"Table\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [122, 34, 409, 86], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [887, 42, 975, 78], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 194, 415, 205], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 211, 930, 221], \"label\": \"Text\"}, {\"bbox_2d\": [78, 230, 838, 253], \"label\": \"Text\"}, {\"bbox_2d\": [999, 238, 1015, 301], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 268, 955, 801], \"label\": \"Table\"}, {\"bbox_2d\": [942, 994, 990, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [886, 34, 976, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [1002, 238, 1018, 302], \"label\": \"Page-header\"}, {\"bbox_2d\": [943, 991, 991, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [78, 189, 415, 205], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 209, 933, 220], \"label\": \"Text\"}, {\"bbox_2d\": [78, 228, 841, 252], \"label\": \"Text\"}, {\"bbox_2d\": [77, 269, 954, 800], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [68, 51, 602, 67], \"label\": \"Section-header\"}, {\"bbox_2d\": [68, 74, 691, 83], \"label\": \"Text\"}, {\"bbox_2d\": [65, 107, 383, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 132, 906, 156], \"label\": \"Text\"}, {\"bbox_2d\": [94, 171, 908, 209], \"label\": \"Text\"}, {\"bbox_2d\": [94, 256, 881, 295], \"label\": \"Text\"}, {\"bbox_2d\": [94, 228, 235, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 594, 900, 618], \"label\": \"Text\"}, {\"bbox_2d\": [62, 988, 282, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [94, 312, 738, 580], \"label\": \"Table\"}, {\"bbox_2d\": [94, 628, 703, 731], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [69, 45, 601, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [69, 70, 690, 85], \"label\": \"Text\"}, {\"bbox_2d\": [61, 102, 383, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 223, 234, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 129, 915, 156], \"label\": \"Text\"}, {\"bbox_2d\": [94, 169, 918, 210], \"label\": \"Text\"}, {\"bbox_2d\": [94, 252, 890, 296], \"label\": \"Text\"}, {\"bbox_2d\": [95, 590, 908, 617], \"label\": \"Text\"}, {\"bbox_2d\": [95, 312, 738, 578], \"label\": \"Table\"}, {\"bbox_2d\": [60, 987, 282, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [94, 629, 704, 732], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 41, 994, 64], \"label\": \"Text\"}, {\"bbox_2d\": [25, 70, 1007, 104], \"label\": \"Text\"}, {\"bbox_2d\": [25, 110, 994, 133], \"label\": \"Text\"}, {\"bbox_2d\": [25, 139, 962, 173], \"label\": \"Text\"}, {\"bbox_2d\": [55, 179, 331, 189], \"label\": \"Text\"}, {\"bbox_2d\": [55, 195, 987, 230], \"label\": \"Text\"}, {\"bbox_2d\": [25, 236, 962, 259], \"label\": \"Text\"}, {\"bbox_2d\": [55, 264, 863, 274], \"label\": \"Text\"}, {\"bbox_2d\": [55, 279, 118, 289], \"label\": \"Text\"}, {\"bbox_2d\": [25, 316, 959, 338], \"label\": \"Text\"}, {\"bbox_2d\": [25, 361, 276, 372], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 378, 158, 389], \"label\": \"Text\"}, {\"bbox_2d\": [508, 400, 525, 411], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 41, 996, 63], \"label\": \"Text\"}, {\"bbox_2d\": [25, 70, 1008, 103], \"label\": \"Text\"}, {\"bbox_2d\": [25, 109, 995, 131], \"label\": \"Text\"}, {\"bbox_2d\": [25, 138, 973, 171], \"label\": \"Text\"}, {\"bbox_2d\": [57, 177, 331, 188], \"label\": \"Text\"}, {\"bbox_2d\": [25, 234, 964, 256], \"label\": \"Text\"}, {\"bbox_2d\": [25, 194, 987, 228], \"label\": \"Text\"}, {\"bbox_2d\": [57, 262, 864, 273], \"label\": \"Text\"}, {\"bbox_2d\": [57, 279, 119, 290], \"label\": \"Text\"}, {\"bbox_2d\": [25, 315, 958, 337], \"label\": \"Text\"}, {\"bbox_2d\": [57, 377, 157, 388], \"label\": \"Text\"}, {\"bbox_2d\": [508, 401, 525, 411], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 362, 276, 372], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 299, 234, 310], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [60, 42, 464, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 88, 189, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 120, 260, 149], \"label\": \"Text\"}, {\"bbox_2d\": [60, 178, 272, 207], \"label\": \"Text\"}, {\"bbox_2d\": [60, 250, 201, 282], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 281, 256, 292], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 233, 366, 248], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 74, 802, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 535, 733, 553], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 535, 263, 553], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 677, 456, 692], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 707, 189, 725], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 567, 291, 647], \"label\": \"Text\"}, {\"bbox_2d\": [297, 567, 479, 650], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [60, 567, 195, 647], \"label\": \"Text\"}, {\"bbox_2d\": [527, 647, 696, 663], \"label\": \"Text\"}, {\"bbox_2d\": [527, 647, 696, 663], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [527, 567, 732, 631], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2d\": [60, 707, 344, 824], \"label\": \"Text\"}, {\"bbox_2", "label": "```json[{\"bbox_2d\": [508, 990, 527, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [996, 402, 1014, 633], \"label\": \"Page-header\"}, {\"bbox_2d\": [525, 103, 952, 509], \"label\": \"Picture\"}, {\"bbox_2d\": [60, 513, 611, 532], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 41, 463, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 74, 802, 105], \"label\": \"Text\"}, {\"bbox_2d\": [325, 72, 441, 91], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 88, 188, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [172, 234, 365, 251], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 249, 198, 283], \"label\": \"Section-header\"}, {\"bbox_2d\": [284, 279, 491, 413], \"label\": \"Picture\"}, {\"bbox_2d\": [293, 126, 508, 159], \"label\": \"Picture\"}, {\"bbox_2d\": [302, 173, 511, 199], \"label\": \"Picture\"}, {\"bbox_2d\": [60, 178, 271, 208], \"label\": \"Text\"}, {\"bbox_2d\": [60, 120, 257, 150], \"label\": \"Text\"}, {\"bbox_2d\": [56, 533, 970, 984], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [368, 79, 670, 91], \"label\": \"Page-header\"}, {\"bbox_2d\": [179, 138, 857, 158], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 209, 839, 243], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 297, 184, 312], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 211, 506, 281], \"label\": \"Text\"}, {\"bbox_2d\": [115, 318, 506, 388], \"label\": \"Text\"}, {\"bbox_2d\": [115, 393, 506, 583], \"label\": \"Text\"}, {\"bbox_2d\": [115, 588, 506, 817], \"label\": \"Text\"}, {\"bbox_2d\": [115, 823, 506, 935], \"label\": \"Text\"}, {\"bbox_2d\": [528, 851, 926, 944], \"label\": \"Text\"}, {\"bbox_2d\": [528, 725, 926, 837], \"label\": \"Text\"}, {\"bbox_2d\": [528, 541, 926, 709], \"label\": \"Text\"}, {\"bbox_2d\": [528, 251, 926, 535], \"label\": \"Text\"}, {\"bbox_2d\": [510, 1001, 523, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [367, 79, 668, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [178, 135, 857, 157], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 210, 511, 279], \"label\": \"Text\"}, {\"bbox_2d\": [527, 249, 923, 533], \"label\": \"Text\"}, {\"bbox_2d\": [527, 541, 923, 708], \"label\": \"Text\"}, {\"bbox_2d\": [527, 726, 923, 835], \"label\": \"Text\"}, {\"bbox_2d\": [527, 853, 924, 942], \"label\": \"Text\"}, {\"bbox_2d\": [512, 999, 523, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [115, 824, 511, 932], \"label\": \"Text\"}, {\"bbox_2d\": [115, 590, 511, 815], \"label\": \"Text\"}, {\"bbox_2d\": [115, 395, 511, 581], \"label\": \"Text\"}, {\"bbox_2d\": [115, 317, 511, 387], \"label\": \"Text\"}, {\"bbox_2d\": [527, 209, 839, 221], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 230, 597, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 296, 184, 309], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [137, 97, 313, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 120, 351, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [142, 161, 205, 174], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 379, 248, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 422, 283, 434], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 508, 260, 520], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 551, 236, 563], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 594, 205, 606], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 770, 283, 782], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 810, 262, 822], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 725, 469, 737], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 744, 754, 752], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 769, 525, 794], \"label\": \"Text\"}, {\"bbox_2d\": [137, 610, 914, 707], \"label\": \"Text\"}, {\"bbox_2d\": [137, 181, 914, 364], \"label\": \"Text\"}, {\"bbox_2d\": [137, 394, 531, 405], \"label\": \"Text\"}, {\"bbox_2d\": [137, 440, 517, 460], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 466, 486, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 483, 367, 491], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 525, 153, 536], \"label\": \"Text\"}, {\"bbox_2d\": [137, 566, 376, 577], \"label\": \"Text\"}, {\"bbox_2d\": [137, 798, 262, 808], \"label\": \"Text\"}, {\"bbox_2d\": [137, 827, 153, 838], \"label\": \"Text\"}, {\"bbox_2d\": [496, 941, 537, 953], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [137, 93, 312, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 117, 353, 146], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 160, 204, 175], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 177, 920, 364], \"label\": \"Text\"}, {\"bbox_2d\": [137, 391, 534, 406], \"label\": \"Text\"}, {\"bbox_2d\": [137, 434, 519, 492], \"label\": \"Text\"}, {\"bbox_2d\": [137, 521, 153, 536], \"label\": \"Text\"}, {\"bbox_2d\": [137, 419, 282, 435], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 376, 248, 392], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 506, 261, 521], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 549, 235, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 592, 204, 607], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 563, 376, 579], \"label\": \"Text\"}, {\"bbox_2d\": [137, 607, 918, 708], \"label\": \"Text\"}, {\"bbox_2d\": [137, 736, 758, 751], \"label\": \"Text\"}, {\"bbox_2d\": [137, 779, 528, 794], \"label\": \"Text\"}, {\"bbox_2d\": [137, 822, 158, 837], \"label\": \"Text\"}, {\"bbox_2d\": [137, 721, 469, 736], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 764, 282, 780], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 807, 261, 823], \"label\": \"Section-header\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [296, 128, 714, 450], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 500, 911, 573], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 600, 911, 700], \"label\": \"Text\"}, {\"bbox_2d\": [121, 710, 911, 781], \"label\": \"Text\"}, {\"bbox_2d\": [121, 792, 911, 945], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 501, 911, 572], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 601, 911, 699], \"label\": \"Text\"}, {\"bbox_2d\": [121, 711, 911, 781], \"label\": \"Text\"}, {\"bbox_2d\": [121, 793, 911, 945], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [294, 122, 738, 450], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [248, 48, 434, 168], \"label\": \"Text\"}, {\"bbox_2d\": [465, 48, 651, 168], \"label\": \"Text\"}, {\"bbox_2d\": [736, 48, 935, 152], \"label\": \"Text\"}, {\"bbox_2d\": [279, 303, 942, 404], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [239, 48, 433, 168], \"label\": \"List-item\"}, {\"bbox_2d\": [731, 48, 933, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [464, 48, 650, 168], \"label\": \"Text\"}, {\"bbox_2d\": [60, 399, 1033, 989], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [150, 139, 844, 195], \"label\": \"Text\"}, {\"bbox_2d\": [150, 394, 841, 407], \"label\": \"Text\"}, {\"bbox_2d\": [150, 436, 840, 469], \"label\": \"Text\"}, {\"bbox_2d\": [150, 500, 342, 512], \"label\": \"Text\"}, {\"bbox_2d\": [150, 632, 267, 665], \"label\": \"Text\"}, {\"bbox_2d\": [150, 695, 376, 707], \"label\": \"Text\"}, {\"bbox_2d\": [907, 946, 934, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [892, 963, 1000, 1018], \"label\": \"Picture\"}, {\"bbox_2d\": [585, 69, 842, 81], \"label\": \"Section-header\"}, {\"bbox_2d\": [558, 140, 940, 152], \"label\": \"Section-header\"}, {\"bbox_2d\": [146, 223, 839, 364], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [586, 69, 844, 83], \"label\": \"Page-header\"}, {\"bbox_2d\": [151, 140, 943, 195], \"label\": \"Text\"}, {\"bbox_2d\": [147, 223, 839, 365], \"label\": \"Table\"}, {\"bbox_2d\": [151, 394, 838, 406], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 437, 838, 470], \"label\": \"Text\"}, {\"bbox_2d\": [151, 500, 342, 512], \"label\": \"Text\"}, {\"bbox_2d\": [151, 632, 268, 665], \"label\": \"Text\"}, {\"bbox_2d\": [151, 696, 378, 707], \"label\": \"Text\"}, {\"bbox_2d\": [893, 962, 985, 1017], \"label\": \"Picture\"}, {\"bbox_2d\": [906, 948, 927, 958], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [429, 27, 613, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 83, 922, 125], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 128, 923, 192], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 218, 887, 262], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 287, 868, 306], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 310, 888, 352], \"label\": \"Text\"}, {\"bbox_2d\": [109, 378, 768, 398], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 402, 925, 468], \"label\": \"Text\"}, {\"bbox_2d\": [109, 604, 954, 669], \"label\": \"Text\"}, {\"bbox_2d\": [109, 694, 933, 735], \"label\": \"Text\"}, {\"bbox_2d\": [109, 528, 921, 582], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 782, 232, 807], \"label\": \"Section-header\"}, {\"bbox_2d\": [586, 790, 666, 814], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 817, 413, 918], \"label\": \"Text\"}, {\"bbox_2d\": [586, 825, 885, 920], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [110, 82, 924, 126], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 126, 928, 192], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 602, 958, 668], \"label\": \"Text\"}, {\"bbox_2d\": [110, 691, 936, 735], \"label\": \"Text\"}, {\"bbox_2d\": [110, 217, 893, 261], \"label\": \"Text\"}, {\"bbox_2d\": [110, 286, 893, 352], \"label\": \"Text\"}, {\"bbox_2d\": [110, 377, 933, 468], \"label\": \"Text\"}, {\"bbox_2d\": [110, 522, 925, 583], \"label\": \"Section-header\"}, {\"bbox_2d\": [110, 762, 475, 934], \"label\": \"Picture\"}, {\"bbox_2d\": [569, 763, 914, 938], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [893, 47, 959, 67], \"label\": \"Page-header\"}, {\"bbox_2d\": [344, 100, 952, 246], \"label\": \"Text\"}, {\"bbox_2d\": [344, 265, 936, 299], \"label\": \"Text\"}, {\"bbox_2d\": [92, 324, 701, 439], \"label\": \"Text\"}, {\"bbox_2d\": [344, 551, 951, 649], \"label\": \"Text\"}, {\"bbox_2d\": [92, 472, 575, 489], \"label\": \"Text\"}, {\"bbox_2d\": [344, 665, 707, 682], \"label\": \"Text\"}, {\"bbox_2d\": [92, 775, 703, 906], \"label\": \"Text\"}, {\"bbox_2d\": [344, 812, 569, 822], \"label\": \"Section-header\"}, {\"bbox_2d\": [162, 735, 248, 750], \"label\": \"Caption\"}, {\"bbox_2d\": [92, 923, 677, 957], \"label\": \"Text\"}, {\"bbox_2d\": [892, 987, 976, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [777, 951, 885, 966], \"label\": \"Caption\"}, {\"bbox_2d\": [715, 765, 946, 944], \"label\": \"Picture\"}, {\"bbox_2d\": [716, 323, 948, 503], \"label\": \"Picture\"}, {\"bbox_2d\": [89, 549, 322, 729], \"label\": \"Picture\"}, {\"bbox_2d\": [92, 99, 322, 278], \"label\": \"Picture\"}, {\"bbox_2d\": [147, 284, 264, 300], \"label\": \"Caption\"}, {\"bbox_2d\": [784, 509, 879, 525], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [781, 51, 963, 66], \"label\": \"Page-header\"}, {\"bbox_2d\": [343, 102, 947, 246], \"label\": \"Text\"}, {\"bbox_2d\": [343, 267, 947, 296], \"label\": \"Text\"}, {\"bbox_2d\": [93, 327, 699, 438], \"label\": \"Text\"}, {\"bbox_2d\": [93, 476, 592, 488], \"label\": \"Text\"}, {\"bbox_2d\": [344, 553, 969, 648], \"label\": \"Text\"}, {\"bbox_2d\": [344, 669, 729, 681], \"label\": \"Text\"}, {\"bbox_2d\": [93, 778, 695, 906], \"label\": \"Text\"}, {\"bbox_2d\": [93, 926, 701, 955], \"label\": \"Text\"}, {\"bbox_2d\": [784, 990, 974, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [778, 952, 879, 965], \"label\": \"Caption\"}, {\"bbox_2d\": [162, 736, 245, 749], \"label\": \"Caption\"}, {\"bbox_2d\": [785, 512, 876, 524], \"label\": \"Caption\"}, {\"bbox_2d\": [147, 286, 261, 299], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [301, 36, 732, 57], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 79, 518, 89], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 260, 309, 270], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 407, 245, 417], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 448, 199, 457], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 507, 83, 517], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 526, 967, 556], \"label\": \"Text\"}, {\"bbox_2d\": [25, 465, 994, 499], \"label\": \"Text\"}, {\"bbox_2d\": [25, 344, 1003, 397], \"label\": \"Text\"}, {\"bbox_2d\": [25, 277, 1001, 331], \"label\": \"Text\"}, {\"bbox_2d\": [25, 207, 983, 250], \"label\": \"Text\"}, {\"bbox_2d\": [25, 96, 980, 201], \"label\": \"Text\"}, {\"bbox_2d\": [58, 424, 868, 434], \"label\": \"Text\"}, {\"bbox_2d\": [508, 571, 525, 581], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [301, 35, 733, 46], \"label\": \"Section-header\"}, {\"bbox_2d\": [398, 46, 635, 57], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 78, 517, 89], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 260, 307, 270], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 407, 246, 418], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 447, 198, 458], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 506, 83, 517], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 523, 967, 557], \"label\": \"Text\"}, {\"bbox_2d\": [25, 464, 995, 497], \"label\": \"Text\"}, {\"bbox_2d\": [57, 424, 868, 435], \"label\": \"Text\"}, {\"bbox_2d\": [25, 342, 1004, 398], \"label\": \"Text\"}, {\"bbox_2d\": [25, 277, 1006, 333], \"label\": \"Text\"}, {\"bbox_2d\": [25, 206, 982, 250], \"label\": \"Text\"}, {\"bbox_2d\": [25, 95, 984, 197], \"label\": \"Text\"}, {\"bbox_2d\": [508, 569, 525, 580], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 24, 155, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [23, 113, 1011, 458], \"label\": \"Table\"}, {\"bbox_2d\": [312, 471, 720, 482], \"label\": \"Text\"}, {\"bbox_2d\": [504, 505, 529, 516], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [422, 71, 611, 82], \"label\": \"Section-header\"}, {\"bbox_2d\": [347, 82, 687, 104], \"label\": \"Section-header\"}, {\"bbox_2d\": [312, 470, 722, 481], \"label\": \"Text\"}, {\"bbox_2d\": [504, 504, 529, 515], \"label\": \"Page-footer\"}, {\"bbox_2d\": [24, 116, 1011, 457], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [69, 153, 300, 170], \"label\": \"Section-header\"}, {\"bbox_2d\": [69, 177, 967, 237], \"label\": \"Text\"}, {\"bbox_2d\": [69, 475, 962, 510], \"label\": \"Text\"}, {\"bbox_2d\": [69, 940, 172, 971], \"label\": \"Page-footer\"}, {\"bbox_2d\": [894, 940, 940, 954], \"label\": \"Page-footer\"}, {\"bbox_2d\": [67, 247, 959, 443], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [69, 148, 303, 171], \"label\": \"Section-header\"}, {\"bbox_2d\": [69, 172, 969, 237], \"label\": \"Text\"}, {\"bbox_2d\": [69, 472, 964, 511], \"label\": \"Footnote\"}, {\"bbox_2d\": [891, 937, 944, 955], \"label\": \"Page-footer\"}, {\"bbox_2d\": [69, 937, 175, 973], \"label\": \"Page-footer\"}, {\"bbox_2d\": [66, 245, 959, 443], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [294, 83, 596, 136], \"label\": \"Text\"}, {\"bbox_2d\": [525, 975, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 191, 899, 594], \"label\": \"Picture\"}, {\"bbox_2d\": [294, 605, 483, 618], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [293, 83, 597, 96], \"label\": \"Text\"}, {\"bbox_2d\": [301, 109, 552, 122], \"label\": \"Text\"}, {\"bbox_2d\": [301, 122, 483, 135], \"label\": \"Text\"}, {\"bbox_2d\": [293, 606, 483, 617], \"label\": \"Caption\"}, {\"bbox_2d\": [294, 190, 897, 592], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [689, 146, 907, 187], \"label\": \"Picture\"}, {\"bbox_2d\": [28, 410, 204, 504], \"label\": \"Text\"}, {\"bbox_2d\": [662, 245, 811, 371], \"label\": \"Text\"}, {\"bbox_2d\": [369, 646, 637, 689], \"label\": \"Text\"}, {\"bbox_2d\": [683, 784, 909, 862], \"label\": \"Text\"}, {\"bbox_2d\": [258, 629, 611, 642], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [687, 147, 909, 185], \"label\": \"Picture\"}, {\"bbox_2d\": [290, 245, 638, 528], \"label\": \"Picture\"}, {\"bbox_2d\": [366, 709, 624, 897], \"label\": \"Picture\"}, {\"bbox_2d\": [679, 528, 919, 759], \"label\": \"Picture\"}, {\"bbox_2d\": [28, 519, 211, 803], \"label\": \"Picture\"}, {\"bbox_2d\": [661, 245, 815, 369], \"label\": \"Caption\"}, {\"bbox_2d\": [683, 784, 915, 859], \"label\": \"Caption\"}, {\"bbox_2d\": [367, 629, 642, 689], \"label\": \"Caption\"}, {\"bbox_2d\": [30, 411, 207, 502], \"label\": \"Caption\"}, {\"bbox_2d\": [259, 363, 282, 402], \"label\": \"Text\"}, {\"bbox_2d\": [217, 561, 240, 600], \"label\": \"Text\"}, {\"bbox_2d\": [335, 782, 358, 821], \"label\": \"Text\"}, {\"bbox_2d\": [651, 580, 674, 619], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [59, 35, 240, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [79, 145, 271, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 495, 267, 509], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 693, 320, 725], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 238, 429, 253], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 319, 297, 333], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 368, 335, 382], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 458, 294, 472], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 513, 338, 573], \"label\": \"Text\"}, {\"bbox_2d\": [79, 582, 353, 642], \"label\": \"Text\"}, {\"bbox_2d\": [381, 513, 652, 535], \"label\": \"Text\"}, {\"bbox_2d\": [381, 544, 648, 642], \"label\": \"Text\"}, {\"bbox_2d\": [684, 513, 954, 548], \"label\": \"Text\"}, {\"bbox_2d\": [684, 558, 956, 642], \"label\": \"Text\"}, {\"bbox_2d\": [79, 983, 193, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [421, 667, 956, 944], \"label\": \"Table\"}, {\"bbox_2d\": [79, 359, 956, 475], \"label\": \"Table\"}, {\"bbox_2d\": [79, 242, 956, 348], \"label\": \"Table\"}, {\"bbox_2d\": [759, 192, 956, 230], \"label\": \"Table\"}, {\"bbox_2d\": [79, 70, 167, 81], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [78, 143, 271, 162], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 190, 961, 476], \"label\": \"Table\"}, {\"bbox_2d\": [78, 493, 268, 508], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 512, 345, 573], \"label\": \"Text\"}, {\"bbox_2d\": [78, 581, 353, 642], \"label\": \"Text\"}, {\"bbox_2d\": [381, 512, 655, 536], \"label\": \"Text\"}, {\"bbox_2d\": [381, 544, 655, 642], \"label\": \"Text\"}, {\"bbox_2d\": [684, 512, 943, 548], \"label\": \"Text\"}, {\"bbox_2d\": [684, 556, 958, 642], \"label\": \"Text\"}, {\"bbox_2d\": [76, 664, 958, 943], \"label\": \"Table\"}, {\"bbox_2d\": [78, 984, 193, 995], \"label\": \"Page-footer\"}, {\"bbox_2d\": [57, 34, 240, 80], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [71, 61, 275, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [110, 101, 884, 457], \"label\": \"Picture\"}, {\"bbox_2d\": [111, 477, 881, 749], \"label\": \"Picture\"}, {\"bbox_2d\": [110, 770, 805, 951], \"label\": \"Picture\"}, {\"bbox_2d\": [386, 967, 632, 983], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 990, 516, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [70, 61, 294, 91], \"label\": \"Page-header\"}, {\"bbox_2d\": [110, 93, 896, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [385, 964, 648, 979], \"label\": \"Caption\"}, {\"bbox_2d\": [498, 989, 531, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [776, 72, 956, 102], \"label\": \"Page-header\"}, {\"bbox_2d\": [550, 116, 960, 297], \"label\": \"Text\"}, {\"bbox_2d\": [550, 303, 960, 389], \"label\": \"Text\"}, {\"bbox_2d\": [550, 395, 960, 690], \"label\": \"Text\"}, {\"bbox_2d\": [98, 114, 510, 165], \"label\": \"Text\"}, {\"bbox_2d\": [98, 169, 510, 278], \"label\": \"Text\"}, {\"bbox_2d\": [98, 293, 510, 308], \"label\": \"Text\"}, {\"bbox_2d\": [98, 325, 510, 339], \"label\": \"Text\"}, {\"bbox_2d\": [98, 356, 510, 465], \"label\": \"Text\"}, {\"bbox_2d\": [98, 472, 510, 523], \"label\": \"Text\"}, {\"bbox_2d\": [98, 530, 510, 661], \"label\": \"Text\"}, {\"bbox_2d\": [98, 665, 510, 959], \"label\": \"Text\"}, {\"bbox_2d\": [549, 739, 960, 959], \"label\": \"Text\"}, {\"bbox_2d\": [549, 717, 749, 731], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 989, 537, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [776, 69, 977, 100], \"label\": \"Page-header\"}, {\"bbox_2d\": [101, 110, 515, 164], \"label\": \"Text\"}, {\"bbox_2d\": [99, 166, 528, 276], \"label\": \"Text\"}, {\"bbox_2d\": [98, 288, 517, 306], \"label\": \"Text\"}, {\"bbox_2d\": [100, 320, 520, 339], \"label\": \"Text\"}, {\"bbox_2d\": [98, 352, 521, 659], \"label\": \"Text\"}, {\"bbox_2d\": [98, 660, 524, 956], \"label\": \"Text\"}, {\"bbox_2d\": [517, 982, 545, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [549, 711, 758, 728], \"label\": \"Section-header\"}, {\"bbox_2d\": [548, 735, 969, 956], \"label\": \"Text\"}, {\"bbox_2d\": [548, 390, 971, 685], \"label\": \"Text\"}, {\"bbox_2d\": [550, 111, 974, 295], \"label\": \"Text\"}, {\"bbox_2d\": [550, 297, 972, 388], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 48, 295, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 127, 918, 175], \"label\": \"Text\"}, {\"bbox_2d\": [121, 194, 918, 242], \"label\": \"Text\"}, {\"bbox_2d\": [121, 243, 918, 276], \"label\": \"Text\"}, {\"bbox_2d\": [121, 391, 918, 440], \"label\": \"Text\"}, {\"bbox_2d\": [121, 307, 918, 372], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 376, 607, 392], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 443, 417, 457], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 575, 918, 623], \"label\": \"Text\"}, {\"bbox_2d\": [121, 624, 908, 657], \"label\": \"Text\"}, {\"bbox_2d\": [121, 740, 910, 788], \"label\": \"Text\"}, {\"bbox_2d\": [121, 790, 869, 813], \"label\": \"Text\"}, {\"bbox_2d\": [121, 815, 918, 871], \"label\": \"Text\"}, {\"bbox_2d\": [146, 872, 917, 937], \"label\": \"Text\"}, {\"bbox_2d\": [121, 277, 683, 307], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 178, 407, 193], \"label\": \"Section-header\"}, {\"bbox_2d\": [182, 460, 889, 492], \"label\": \"Text\"}, {\"bbox_2d\": [182, 493, 914, 508], \"label\": \"Text\"}, {\"bbox_2d\": [121, 510, 903, 573], \"label\": \"Text\"}, {\"bbox_2d\": [146, 659, 783, 673], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 674, 882, 707], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 708, 918, 739], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 974, 445, 987], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"Text\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"Text\"}, {\"bbox_2d\": [121, 126, 950, 176], \"label\": \"Text\"}, {\"bbox_2d\": [121, 193, 956, 242], \"label\": \"Text\"}, {\"bbox_2d\": [121, 242, 949, 275], \"label\": \"Text\"}, {\"bbox_2d\": [121, 391, 959, 440], \"label\": \"Text\"}, {\"bbox_2d\": [121, 457, 918, 490], \"label\": \"Text\"}, {\"bbox_2d\": [182, 491, 920, 507], \"label\": \"Text\"}, {\"bbox_2d\": [121, 507, 929, 573], \"label\": \"Text\"}, {\"bbox_2d\": [121, 573, 949, 622], \"label\": \"Text\"}, {\"bbox_2d\": [121, 623, 946, 655], \"label\": \"Text\"}, {\"bbox_2d\": [121, 739, 942, 788], \"label\": \"Text\"}, {\"bbox_2d\": [121, 788, 898, 821], \"label\": \"Text\"}, {\"bbox_2d\": [121, 176, 414, 192], \"label\": \"Section-header\"}, {\"bbox_2d\": [483, 275, 557, 291], \"label\": \"Section-header\"}, {\"bbox_2d\": [356, 292, 684, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [458, 341, 582, 358], \"label\": \"Section-header\"}, {\"bbox_2d\": [379, 358, 655, 374], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 375, 631, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 441, 418, 457], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 308, 962, 341], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 656, 782, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 672, 914, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 706, 959, 738], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 821, 951, 871], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 871, 954, 937], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 920, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 77, 991, 679], \"label\": \"Table\"}, {\"bbox_2d\": [86, 682, 384, 699], \"label\": \"List-item\"}, {\"bbox_2d\": [114, 692, 861, 774], \"label\": \"Text\"}, {\"bbox_2d\": [114, 772, 862, 867], \"label\": \"Text\"}, {\"bbox_2d\": [114, 872, 862, 934], \"label\": \"Text\"}, {\"bbox_2d\": [509, 937, 597, 966], \"label\": \"Formula\"}, {\"bbox_2d\": [481, 1003, 589, 1019], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 78, 986, 676], \"label\": \"Table\"}, {\"bbox_2d\": [124, 693, 972, 772], \"label\": \"Text\"}, {\"bbox_2d\": [124, 767, 955, 875], \"label\": \"Text\"}, {\"bbox_2d\": [124, 871, 962, 934], \"label\": \"Text\"}, {\"bbox_2d\": [481, 1003, 589, 1019], \"label\": \"Page-footer\"}, {\"bbox_2d\": [527, 937, 593, 963], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [789, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [505, 93, 523, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [553, 153, 891, 965], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 155, 401, 957], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [130, 74, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [506, 93, 521, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [789, 74, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [156, 156, 400, 955], \"label\": \"Picture\"}, {\"bbox_2d\": [548, 154, 793, 970], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [464, 67, 544, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [298, 89, 317, 101], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 106, 490, 227], \"label\": \"Text\"}, {\"bbox_2d\": [125, 230, 499, 398], \"label\": \"Text\"}, {\"bbox_2d\": [125, 404, 506, 588], \"label\": \"Text\"}, {\"bbox_2d\": [125, 590, 506, 685], \"label\": \"Text\"}, {\"bbox_2d\": [125, 688, 506, 974], \"label\": \"Text\"}, {\"bbox_2d\": [520, 103, 894, 375], \"label\": \"Text\"}, {\"bbox_2d\": [520, 378, 895, 685], \"label\": \"Text\"}, {\"bbox_2d\": [520, 688, 895, 738], \"label\": \"Text\"}, {\"bbox_2d\": [520, 740, 895, 969], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [297, 87, 322, 103], \"label\": \"Page-header\"}, {\"bbox_2d\": [465, 66, 548, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [693, 84, 716, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [531, 763, 903, 970], \"label\": \"Text\"}, {\"bbox_2d\": [134, 689, 509, 973], \"label\": \"Text\"}, {\"bbox_2d\": [132, 586, 505, 689], \"label\": \"Text\"}, {\"bbox_2d\": [130, 401, 502, 587], \"label\": \"Text\"}, {\"bbox_2d\": [525, 375, 899, 763], \"label\": \"Text\"}, {\"bbox_2d\": [521, 101, 893, 375], \"label\": \"Text\"}, {\"bbox_2d\": [127, 229, 500, 402], \"label\": \"Text\"}, {\"bbox_2d\": [125, 104, 497, 229], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [429, 29, 613, 39], \"label\": \"Page-header\"}, {\"bbox_2d\": [471, 88, 572, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [107, 138, 928, 176], \"label\": \"Text\"}, {\"bbox_2d\": [107, 207, 813, 222], \"label\": \"Text\"}, {\"bbox_2d\": [107, 321, 928, 358], \"label\": \"Text\"}, {\"bbox_2d\": [107, 459, 893, 496], \"label\": \"Text\"}, {\"bbox_2d\": [107, 604, 864, 642], \"label\": \"Text\"}, {\"bbox_2d\": [107, 858, 922, 904], \"label\": \"Text\"}, {\"bbox_2d\": [222, 671, 786, 684], \"label\": \"Text\"}, {\"bbox_2d\": [337, 523, 694, 535], \"label\": \"Text\"}, {\"bbox_2d\": [356, 249, 606, 261], \"label\": \"Text\"}, {\"bbox_2d\": [218, 385, 775, 397], \"label\": \"Text\"}, {\"bbox_2d\": [218, 671, 300, 684], \"label\": \"Text\"}, {\"bbox_2d\": [222, 249, 345, 261], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [471, 79, 574, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [107, 133, 935, 177], \"label\": \"Text\"}, {\"bbox_2d\": [107, 201, 815, 224], \"label\": \"Text\"}, {\"bbox_2d\": [379, 243, 607, 265], \"label\": \"Text\"}, {\"bbox_2d\": [107, 315, 940, 360], \"label\": \"Text\"}, {\"bbox_2d\": [217, 379, 776, 401], \"label\": \"Text\"}, {\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [107, 453, 902, 497], \"label\": \"Text\"}, {\"bbox_2d\": [336, 517, 696, 539], \"label\": \"Text\"}, {\"bbox_2d\": [107, 599, 865, 643], \"label\": \"Text\"}, {\"bbox_2d\": [223, 665, 788, 687], \"label\": \"Text\"}, {\"bbox_2d\": [93, 719, 959, 922], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 50, 610, 84], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 108, 189, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 141, 315, 156], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 289, 409, 302], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 290, 410, 300], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 315, 930, 325], \"label\": \"Text\"}, {\"bbox_2d\": [121, 341, 380, 351], \"label\": \"Text\"}, {\"bbox_2d\": [121, 519, 957, 559], \"label\": \"Text\"}, {\"bbox_2d\": [121, 706, 964, 755], \"label\": \"Text\"}, {\"bbox_2d\": [117, 186, 954, 257], \"label\": \"Table\"}, {\"bbox_2d\": [117, 359, 954, 498], \"label\": \"Table\"}, {\"bbox_2d\": [117, 586, 716, 683], \"label\": \"Table\"}, {\"bbox_2d\": [749, 988, 975, 998], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [130, 45, 568, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 70, 610, 85], \"label\": \"Text\"}, {\"bbox_2d\": [121, 102, 189, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 137, 321, 156], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 172, 953, 254], \"label\": \"Table\"}, {\"bbox_2d\": [120, 336, 953, 496], \"label\": \"Table\"}, {\"bbox_2d\": [121, 311, 935, 325], \"label\": \"Text\"}, {\"bbox_2d\": [121, 285, 410, 303], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 572, 240, 586], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 516, 967, 543], \"label\": \"Text\"}, {\"bbox_2d\": [121, 546, 261, 559], \"label\": \"Text\"}, {\"bbox_2d\": [121, 701, 974, 755], \"label\": \"Text\"}, {\"bbox_2d\": [120, 590, 712, 681], \"label\": \"Table\"}, {\"bbox_2d\": [749, 987, 976, 997], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 83, 455, 958], \"label\": \"Text\"}, {\"bbox_2d\": [399, 83, 651, 958], \"label\": \"Text\"}, {\"bbox_2d\": [677, 83, 902, 955], \"label\": \"Text\"}, {\"bbox_2d\": [677, 161, 691, 178], \"label\": \"Section-header\"}, {\"bbox_2d\": [848, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [675, 85, 900, 135], \"label\": \"Text\"}, {\"bbox_2d\": [675, 160, 693, 180], \"label\": \"Section-header\"}, {\"bbox_2d\": [675, 183, 909, 950], \"label\": \"Text\"}, {\"bbox_2d\": [398, 85, 653, 957], \"label\": \"Text\"}, {\"bbox_2d\": [121, 85, 378, 957], \"label\": \"Text\"}, {\"bbox_2d\": [850, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [302, 53, 326, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [463, 53, 596, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [737, 53, 759, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [211, 85, 506, 169], \"label\": \"Text\"}, {\"bbox_2d\": [123, 187, 506, 329], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 346, 506, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 448, 506, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [182, 506, 506, 618], \"label\": \"Text\"}, {\"bbox_2d\": [123, 636, 506, 663], \"label\": \"List-item\"}, {\"bbox_2d\": [182, 679, 506, 924], \"label\": \"Text\"}, {\"bbox_2d\": [613, 128, 937, 333], \"label\": \"Text\"}, {\"bbox_2d\": [613, 450, 937, 502], \"label\": \"Text\"}, {\"bbox_2d\": [613, 520, 937, 577], \"label\": \"Text\"}, {\"bbox_2d\": [613, 593, 937, 649], \"label\": \"Text\"}, {\"bbox_2d\": [613, 666, 937, 707], \"label\": \"Text\"}, {\"bbox_2d\": [613, 725, 937, 766], \"label\": \"Text\"}, {\"bbox_2d\": [613, 783, 937, 924], \"label\": \"Text\"}, {\"bbox_2d\": [520, 988, 536, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [554, 85, 937, 112], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 346, 937, 392], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 405, 937, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 515, 937, 577], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 666, 937, 707], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [305, 55, 329, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [463, 54, 595, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [736, 55, 760, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [211, 84, 523, 170], \"label\": \"Text\"}, {\"bbox_2d\": [182, 505, 522, 562], \"label\": \"Text\"}, {\"bbox_2d\": [182, 564, 517, 620], \"label\": \"Text\"}, {\"bbox_2d\": [182, 680, 521, 765], \"label\": \"Text\"}, {\"bbox_2d\": [182, 767, 520, 925], \"label\": \"Text\"}, {\"bbox_2d\": [613, 781, 946, 882], \"label\": \"Text\"}, {\"bbox_2d\": [613, 883, 948, 925], \"label\": \"Text\"}, {\"bbox_2d\": [613, 447, 943, 504], \"label\": \"Text\"}, {\"bbox_2d\": [613, 128, 963, 184], \"label\": \"Text\"}, {\"bbox_2d\": [613, 186, 954, 330], \"label\": \"Text\"}, {\"bbox_2d\": [123, 186, 516, 330], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 345, 517, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 447, 529, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 636, 529, 664], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 346, 961, 388], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 404, 961, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 520, 953, 577], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 592, 961, 649], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 665, 961, 707], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 723, 948, 765], \"label\": \"List-item\"}, {\"bbox_2d\": [554, 84, 961, 112], \"label\": \"List-item\"}, {\"bbox_2d\": [520, 989, 538, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 73, 945, 122], \"label\": \"Text\"}, {\"bbox_2d\": [91, 135, 945, 468], \"label\": \"Text\"}, {\"bbox_2d\": [91, 479, 945, 779], \"label\": \"Text\"}, {\"bbox_2d\": [91, 793, 945, 965], \"label\": \"Text\"}, {\"bbox_2d\": [513, 994, 522, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 75, 944, 120], \"label\": \"Text\"}, {\"bbox_2d\": [91, 137, 944, 467], \"label\": \"Text\"}, {\"bbox_2d\": [91, 481, 944, 777], \"label\": \"Text\"}, {\"bbox_2d\": [91, 793, 944, 964], \"label\": \"Text\"}, {\"bbox_2d\": [513, 993, 522, 1007], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [108, 44, 307, 72], \"label\": \"Page-header\"}, {\"bbox_2d\": [181, 117, 801, 283], \"label\": \"Picture\"}, {\"bbox_2d\": [182, 351, 906, 811], \"label\": \"Picture\"}, {\"bbox_2d\": [417, 846, 668, 864], \"label\": \"Caption\"}, {\"bbox_2d\": [533, 979, 559, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [106, 42, 321, 73], \"label\": \"Page-header\"}, {\"bbox_2d\": [176, 117, 909, 810], \"label\": \"Picture\"}, {\"bbox_2d\": [417, 845, 661, 860], \"label\": \"Caption\"}, {\"bbox_2d\": [532, 975, 564, 990], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 112, 1009, 133], \"label\": \"Text\"}, {\"bbox_2d\": [25, 157, 1009, 190], \"label\": \"Text\"}, {\"bbox_2d\": [25, 213, 1009, 295], \"label\": \"Text\"}, {\"bbox_2d\": [25, 308, 1009, 338], \"label\": \"Text\"}, {\"bbox_2d\": [25, 350, 1009, 394], \"label\": \"Text\"}, {\"bbox_2d\": [25, 406, 1009, 435], \"label\": \"Text\"}, {\"bbox_2d\": [25, 462, 1009, 491], \"label\": \"Text\"}, {\"bbox_2d\": [25, 503, 1009, 524], \"label\": \"Text\"}, {\"bbox_2d\": [25, 536, 1009, 557], \"label\": \"Text\"}, {\"bbox_2d\": [25, 570, 1009, 602], \"label\": \"Text\"}, {\"bbox_2d\": [25, 630, 1009, 648], \"label\": \"Text\"}, {\"bbox_2d\": [25, 660, 1009, 680], \"label\": \"Text\"}, {\"bbox_2d\": [25, 682, 1009, 699], \"label\": \"Text\"}, {\"bbox_2d\": [25, 702, 1009, 730], \"label\": \"Text\"}, {\"bbox_2d\": [25, 614, 250, 626], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 448, 257, 459], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 308, 174, 318], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 202, 157, 213], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 147, 59, 157], \"label\": \"Section-header\"}, {\"bbox_2d\": [982, 25, 1009, 36], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [870, 24, 1009, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 114, 1009, 133], \"label\": \"Text\"}, {\"bbox_2d\": [25, 159, 1009, 189], \"label\": \"Text\"}, {\"bbox_2d\": [25, 146, 59, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 202, 157, 213], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 306, 171, 317], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 447, 257, 459], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 569, 313, 581], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 614, 250, 627], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 660, 1009, 680], \"label\": \"Text\"}, {\"bbox_2d\": [25, 627, 1009, 647], \"label\": \"Text\"}, {\"bbox_2d\": [25, 581, 1009, 601], \"label\": \"Text\"}, {\"bbox_2d\": [25, 536, 1009, 556], \"label\": \"Text\"}, {\"bbox_2d\": [25, 503, 1009, 523], \"label\": \"Text\"}, {\"bbox_2d\": [25, 460, 1009, 489], \"label\": \"Text\"}, {\"bbox_2d\": [25, 404, 1009, 434], \"label\": \"Text\"}, {\"bbox_2d\": [25, 351, 1009, 391], \"label\": \"Text\"}, {\"bbox_2d\": [25, 318, 1009, 338], \"label\": \"Text\"}, {\"bbox_2d\": [25, 213, 1009, 293], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [60, 37, 240, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 202, 784, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 348, 346, 364], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 613, 296, 647], \"label\": \"Section-header\"}, {\"bbox_2d\": [685, 604, 872, 618], \"label\": \"Section-header\"}, {\"bbox_2d\": [685, 871, 954, 890], \"label\": \"Text\"}, {\"bbox_2d\": [685, 789, 954, 863], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 699, 236, 710], \"label\": \"Text\"}, {\"bbox_2d\": [78, 984, 193, 995], \"label\": \"Page-footer\"}, {\"bbox_2d\": [381, 833, 652, 863], \"label\": \"Text\"}, {\"bbox_2d\": [381, 833, 652, 851], \"label\": \"Text\"}, {\"bbox_2d\": [78, 372, 341, 420], \"label\": \"Text\"}, {\"bbox_2d\": [78, 428, 348, 513], \"label\": \"Text\"}, {\"bbox_2d\": [78, 522, 341, 593], \"label\": \"Text\"}, {\"bbox_2d\": [381, 549, 640, 583], \"label\": \"Text\"}, {\"bbox_2d\": [381, 592, 652, 690], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, 338, 691], \"label\": \"Text\"}, {\"bbox_2d\": [78, 654, ", "label": "```json[{\"bbox_2d\": [57, 34, 240, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 194, 780, 309], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 346, 347, 364], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 610, 297, 646], \"label\": \"Section-header\"}, {\"bbox_2d\": [684, 600, 873, 619], \"label\": \"Section-header\"}, {\"bbox_2d\": [381, 706, 592, 725], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 984, 193, 995], \"label\": \"Page-footer\"}, {\"bbox_2d\": [16, 975, 66, 1009], \"label\": \"Picture\"}, {\"bbox_2d\": [381, 913, 654, 949], \"label\": \"Text\"}, {\"bbox_2d\": [381, 832, 643, 905], \"label\": \"Text\"}, {\"bbox_2d\": [381, 788, 650, 824], \"label\": \"Text\"}, {\"bbox_2d\": [381, 732, 625, 780], \"label\": \"Text\"}, {\"bbox_2d\": [684, 695, 960, 780], \"label\": \"Text\"}, {\"bbox_2d\": [684, 788, 959, 861], \"label\": \"Text\"}, {\"bbox_2d\": [684, 869, 958, 892], \"label\": \"Text\"}, {\"bbox_2d\": [684, 626, 944, 687], \"label\": \"Text\"}, {\"bbox_2d\": [381, 592, 654, 689], \"label\": \"Text\"}, {\"bbox_2d\": [381, 548, 643, 584], \"label\": \"Text\"}, {\"bbox_2d\": [684, 523, 954, 583], \"label\": \"Text\"}, {\"bbox_2d\": [78, 521, 351, 593], \"label\": \"Text\"}, {\"bbox_2d\": [78, 653, 342, 689], \"label\": \"Text\"}, {\"bbox_2d\": [78, 698, 236, 709], \"label\": \"Text\"}, {\"bbox_2d\": [684, 349, 959, 459], \"label\": \"Text\"}, {\"bbox_2d\": [684, 467, 953, 515], \"label\": \"Text\"}, {\"bbox_2d\": [78, 371, 345, 420], \"label\": \"Text\"}, {\"bbox_2d\": [78, 427, 352, 513], \"label\": \"Text\"}, {\"bbox_2d\": [684, 897, 960, 945], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 917, 330, 942], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 888, 345, 912], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 844, 348, 882], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 801, 334, 839], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 745, 345, 796], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 715, 314, 739], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 349, 639, 387], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 392, 657, 417], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 422, 640, 447], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 452, 642, 477], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 482, 645, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 512, 650, 537], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [229, 133, 805, 146], \"label\": \"Text\"}, {\"bbox_2d\": [94, 928, 936, 956], \"label\": \"Footnote\"}, {\"bbox_2d\": [97, 979, 195, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [900, 958, 938, 998], \"label\": \"Page-footer\"}, {\"bbox_2d\": [829, 84, 936, 100], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [828, 80, 942, 101], \"label\": \"Page-header\"}, {\"bbox_2d\": [229, 129, 805, 149], \"label\": \"Text\"}, {\"bbox_2d\": [98, 927, 942, 957], \"label\": \"Footnote\"}, {\"bbox_2d\": [899, 955, 945, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 974, 200, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 92, 311, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 225, 488, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 325, 321, 339], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 441, 402, 456], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 495, 376, 509], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 581, 444, 595], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 650, 478, 664], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 719, 393, 733], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 790, 354, 804], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 859, 340, 873], \"label\": \"Section-header\"}, {\"bbox_2d\": [702, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 879, 926, 937], \"label\": \"Text\"}, {\"bbox_2d\": [231, 809, 885, 836], \"label\": \"Text\"}, {\"bbox_2d\": [231, 740, 909, 767], \"label\": \"Text\"}, {\"bbox_2d\": [231, 670, 878, 698], \"label\": \"Text\"}, {\"bbox_2d\": [231, 602, 888, 629], \"label\": \"Text\"}, {\"bbox_2d\": [231, 516, 908, 559], \"label\": \"Text\"}, {\"bbox_2d\": [231, 462, 445, 474], \"label\": \"Text\"}, {\"bbox_2d\": [231, 346, 919, 420], \"label\": \"Text\"}, {\"bbox_2d\": [231, 245, 926, 304], \"label\": \"Text\"}, {\"bbox_2d\": [231, 113, 918, 203], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 311, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 225, 489, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 326, 321, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 442, 402, 457], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 496, 377, 510], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 581, 443, 595], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 650, 478, 665], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 719, 393, 734], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 789, 353, 803], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 858, 339, 873], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 878, 926, 937], \"label\": \"Text\"}, {\"bbox_2d\": [231, 809, 881, 836], \"label\": \"Text\"}, {\"bbox_2d\": [231, 739, 935, 767], \"label\": \"Text\"}, {\"bbox_2d\": [231, 670, 883, 698], \"label\": \"Text\"}, {\"bbox_2d\": [231, 601, 889, 628], \"label\": \"Text\"}, {\"bbox_2d\": [231, 516, 911, 559], \"label\": \"Text\"}, {\"bbox_2d\": [231, 462, 474, 474], \"label\": \"Text\"}, {\"bbox_2d\": [231, 346, 920, 420], \"label\": \"Text\"}, {\"bbox_2d\": [231, 245, 926, 304], \"label\": \"Text\"}, {\"bbox_2d\": [231, 113, 918, 204], \"label\": \"Text\"}, {\"bbox_2d\": [701, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [816, 999, 1004, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [53, 719, 190, 733], \"label\": \"Section-header\"}, {\"bbox_2d\": [53, 371, 354, 386], \"label\": \"Section-header\"}, {\"bbox_2d\": [53, 266, 483, 282], \"label\": \"Section-header\"}, {\"bbox_2d\": [53, 134, 283, 150], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 178, 236, 192], \"label\": \"Text\"}, {\"bbox_2d\": [79, 197, 921, 210], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 216, 947, 242], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 248, 508, 261], \"label\": \"Text\"}, {\"bbox_2d\": [79, 286, 957, 365], \"label\": \"Text\"}, {\"bbox_2d\": [79, 391, 967, 419], \"label\": \"Text\"}, {\"bbox_2d\": [79, 424, 974, 450], \"label\": \"Text\"}, {\"bbox_2d\": [79, 456, 977, 511], \"label\": \"Text\"}, {\"bbox_2d\": [79, 665, 970, 692], \"label\": \"Text\"}, {\"bbox_2d\": [79, 697, 974, 714], \"label\": \"Text\"}, {\"bbox_2d\": [79, 736, 972, 805], \"label\": \"Text\"}, {\"bbox_2d\": [79, 609, 495, 624], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 630, 862, 645], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 651, 348, 661], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 516, 193, 528], \"label\": \"List-item\"}, {\"bbox_2d\": [106, 534, 402, 548], \"label\": \"List-item\"}, {\"bbox_2d\": [106, 553, 607, 566], \"label\": \"List-item\"}, {\"bbox_2d\": [106, 572, 304, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [106, 591, 677, 605], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [53, 139, 284, 150], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 612, 495, 623], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 650, 347, 661], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 631, 862, 642], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 594, 678, 604], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 575, 304, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 556, 607, 566], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 537, 402, 547], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 518, 193, 528], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 669, 971, 693], \"label\": \"Text\"}, {\"bbox_2d\": [53, 720, 190, 731], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 740, 974, 804], \"label\": \"Text\"}, {\"bbox_2d\": [79, 702, 977, 712], \"label\": \"Text\"}, {\"bbox_2d\": [79, 459, 979, 509], \"label\": \"Text\"}, {\"bbox_2d\": [79, 427, 975, 451], \"label\": \"Text\"}, {\"bbox_2d\": [79, 395, 968, 418], \"label\": \"Text\"}, {\"bbox_2d\": [53, 269, 483, 280], \"label\": \"Section-header\"}, {\"bbox_2d\": [53, 375, 354, 385], \"label\": \"Section-header\"}, {\"bbox_2d\": [53, 160, 619, 171], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 199, 921, 210], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 218, 947, 242], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 250, 508, 261], \"label\": \"Text\"}, {\"bbox_2d\": [79, 180, 238, 191], \"label\": \"Text\"}, {\"bbox_2d\": [79, 289, 958, 366], \"label\": \"Text\"}, {\"bbox_2d\": [867, 1003, 1004, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [812, 1001, 862, 1016], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [77, 71, 360, 82], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 173, 346, 184], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 82, 917, 162], \"label\": \"Text\"}, {\"bbox_2d\": [287, 777, 450, 787], \"label\": \"Text\"}, {\"bbox_2d\": [287, 804, 584, 949], \"label\": \"Text\"}, {\"bbox_2d\": [610, 804, 890, 949], \"label\": \"Text\"}, {\"bbox_2d\": [77, 906, 215, 917], \"label\": \"Caption\"}, {\"bbox_2d\": [77, 749, 260, 897], \"label\": \"Picture\"}, {\"bbox_2d\": [287, 750, 600, 759], \"label\": \"Section-header\"}, {\"bbox_2d\": [937, 982, 981, 994], \"label\": \"Page-footer\"}, {\"bbox_2d\": [76, 197, 925, 696], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [76, 70, 360, 80], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 83, 916, 161], \"label\": \"Text\"}, {\"bbox_2d\": [76, 174, 345, 185], \"label\": \"Section-header\"}, {\"bbox_2d\": [74, 196, 919, 696], \"label\": \"Table\"}, {\"bbox_2d\": [79, 905, 215, 914], \"label\": \"Caption\"}, {\"bbox_2d\": [74, 748, 260, 897], \"label\": \"Picture\"}, {\"bbox_2d\": [284, 749, 598, 760], \"label\": \"Section-header\"}, {\"bbox_2d\": [286, 776, 453, 787], \"label\": \"Text\"}, {\"bbox_2d\": [286, 803, 582, 948], \"label\": \"Text\"}, {\"bbox_2d\": [610, 804, 890, 948], \"label\": \"Text\"}, {\"bbox_2d\": [966, 982, 981, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [73, 207, 914, 258], \"label\": \"Text\"}, {\"bbox_2d\": [73, 271, 455, 279], \"label\": \"Section-header\"}, {\"bbox_2d\": [71, 292, 596, 816], \"label\": \"Table\"}, {\"bbox_2d\": [598, 292, 916, 816], \"label\": \"Table\"}, {\"bbox_2d\": [5, 767, 26, 862], \"label\": \"Page-header\"}, {\"bbox_2d\": [52, 1002, 222, 1011], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [74, 209, 913, 258], \"label\": \"Text\"}, {\"bbox_2d\": [12, 770, 22, 860], \"label\": \"Page-header\"}, {\"bbox_2d\": [74, 271, 455, 282], \"label\": \"List-item\"}, {\"bbox_2d\": [63, 289, 925, 821], \"label\": \"Table\"}, {\"bbox_2d\": [51, 1002, 223, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 49, 296, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 229, 897, 259], \"label\": \"Text\"}, {\"bbox_2d\": [147, 261, 908, 290], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 292, 919, 323], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 326, 919, 407], \"label\": \"Text\"}, {\"bbox_2d\": [184, 409, 338, 423], \"label\": \"Text\"}, {\"bbox_2d\": [121, 426, 910, 457], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 459, 919, 524], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 558, 379, 572], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 575, 884, 606], \"label\": \"Text\"}, {\"bbox_2d\": [121, 608, 919, 655], \"label\": \"Text\"}, {\"bbox_2d\": [121, 657, 912, 705], \"label\": \"Text\"}, {\"bbox_2d\": [121, 707, 547, 721], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 724, 905, 772], \"label\": \"Text\"}, {\"bbox_2d\": [121, 774, 909, 821], \"label\": \"Text\"}, {\"bbox_2d\": [121, 823, 901, 871], \"label\": \"Text\"}, {\"bbox_2d\": [121, 873, 910, 904], \"label\": \"Text\"}, {\"bbox_2d\": [121, 906, 919, 937], \"label\": \"Text\"}, {\"bbox_2d\": [121, 973, 445, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [147, 128, 918, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [484, 526, 669, 557], \"label\": \"Section-header\"}, {\"bbox_2d\": [184, 873, 910, 895], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 126, 928, 176], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 176, 935, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [153, 259, 915, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [149, 292, 925, 325], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 424, 950, 457], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 457, 960, 523], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 226, 931, 258], \"label\": \"Text\"}, {\"bbox_2d\": [121, 325, 931, 407], \"label\": \"Text\"}, {\"bbox_2d\": [182, 408, 336, 424], \"label\": \"Text\"}, {\"bbox_2d\": [121, 573, 908, 606], \"label\": \"Text\"}, {\"bbox_2d\": [121, 606, 954, 655], \"label\": \"Text\"}, {\"bbox_2d\": [121, 656, 942, 705], \"label\": \"Text\"}, {\"bbox_2d\": [182, 722, 661, 738], \"label\": \"Text\"}, {\"bbox_2d\": [121, 739, 937, 771], \"label\": \"Text\"}, {\"bbox_2d\": [121, 772, 949, 821], \"label\": \"Text\"}, {\"bbox_2d\": [121, 821, 938, 871], \"label\": \"Text\"}, {\"bbox_2d\": [121, 871, 943, 904], \"label\": \"Text\"}, {\"bbox_2d\": [121, 904, 955, 937], \"label\": \"Text\"}, {\"bbox_2d\": [121, 705, 556, 722], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 556, 390, 573], \"label\": \"Section-header\"}, {\"bbox_2d\": [483, 523, 557, 539], \"label\": \"Section-header\"}, {\"bbox_2d\": [288, 540, 752, 556], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 84, 923, 113], \"label\": \"Text\"}, {\"bbox_2d\": [293, 119, 848, 132], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 140, 929, 166], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 174, 922, 235], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 242, 841, 254], \"label\": \"List-item\"}, {\"bbox_2d\": [319, 261, 929, 305], \"label\": \"List-item\"}, {\"bbox_2d\": [319, 312, 935, 372], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 379, 935, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 429, 929, 441], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 482, 865, 497], \"label\": \"Text\"}, {\"bbox_2d\": [293, 566, 925, 642], \"label\": \"Text\"}, {\"bbox_2d\": [293, 703, 929, 857], \"label\": \"Text\"}, {\"bbox_2d\": [293, 661, 899, 672], \"label\": \"Text\"}, {\"bbox_2d\": [293, 874, 886, 887], \"label\": \"Text\"}, {\"bbox_2d\": [293, 930, 929, 958], \"label\": \"Text\"}, {\"bbox_2d\": [702, 980, 937, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [218, 906, 648, 922], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 688, 388, 701], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 551, 417, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 460, 391, 477], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 502, 408, 515], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 522, 385, 534], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 499, 407, 512], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 919, 114], \"label\": \"Text\"}, {\"bbox_2d\": [293, 484, 861, 498], \"label\": \"Text\"}, {\"bbox_2d\": [293, 567, 917, 642], \"label\": \"Text\"}, {\"bbox_2d\": [293, 660, 897, 674], \"label\": \"Text\"}, {\"bbox_2d\": [293, 704, 924, 858], \"label\": \"Text\"}, {\"bbox_2d\": [293, 876, 881, 889], \"label\": \"Text\"}, {\"bbox_2d\": [293, 931, 925, 960], \"label\": \"Text\"}, {\"bbox_2d\": [703, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [217, 907, 645, 924], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 688, 382, 702], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 551, 414, 565], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 461, 388, 478], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 118, 841, 133], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 140, 926, 169], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 175, 917, 235], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 242, 838, 255], \"label\": \"List-item\"}, {\"bbox_2d\": [320, 259, 928, 306], \"label\": \"List-item\"}, {\"bbox_2d\": [320, 310, 930, 372], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 379, 929, 423], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 429, 924, 443], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 502, 403, 517], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 521, 382, 537], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [116, 50, 287, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [236, 92, 856, 294], \"label\": \"Text\"}, {\"bbox_2d\": [236, 298, 856, 407], \"label\": \"Text\"}, {\"bbox_2d\": [257, 430, 553, 482], \"label\": \"Text\"}, {\"bbox_2d\": [592, 412, 836, 426], \"label\": \"Text\"}, {\"bbox_2d\": [541, 979, 555, 989], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [115, 49, 286, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [237, 89, 869, 294], \"label\": \"Text\"}, {\"bbox_2d\": [238, 295, 866, 407], \"label\": \"Text\"}, {\"bbox_2d\": [592, 410, 843, 425], \"label\": \"Text\"}, {\"bbox_2d\": [259, 428, 560, 481], \"label\": \"Text\"}, {\"bbox_2d\": [543, 976, 560, 991], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [789, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [677, 155, 746, 165], \"label\": \"Text\"}, {\"bbox_2d\": [282, 155, 351, 165], \"label\": \"Text\"}, {\"bbox_2d\": [157, 171, 491, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 178, 859, 946], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 522, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [790, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [159, 429, 503, 942], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 154, 352, 422], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 152, 897, 660], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 667, 899, 942], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [82, 52, 178, 92], \"label\": \"Section-header\"}, {\"bbox_2d\": [821, 1004, 955, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [81, 110, 314, 153], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 171, 310, 200], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 218, 241, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 279, 231, 307], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 325, 211, 353], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 371, 274, 414], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 432, 210, 460], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 478, 220, 496], \"label\": \"List-item\"}, {\"bbox_2d\": [81, 525, 317, 583], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 110, 549, 185], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 203, 554, 232], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 249, 510, 277], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 295, 497, 323], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 341, 472, 369], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 387, 483, 414], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 432, 466, 460], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 478, 498, 506], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 525, 556, 553], \"label\": \"List-item\"}, {\"bbox_2d\": [379, 571, 496, 599], \"label\": \"List-item\"}, {\"bbox_2d\": [676, 110, 843, 163], \"label\": \"List-item\"}, {\"bbox_2d\": [676, 157, 794, 185], \"label\": \"List-item\"}, {\"bbox_2d\": [676, 203, 843, 245], \"label\": \"List-item\"}, {\"bbox_2d\": [676, 263, 784, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [676, 310, 894, 397], \"label\": \"List-item\"}, {\"bbox_2d\": [676, 415, 800, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [676, 507, 718, 520], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [81, 54, 176, 93], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 112, 246, 153], \"label\": \"Text\"}, {\"bbox_2d\": [81, 173, 310, 199], \"label\": \"Text\"}, {\"bbox_2d\": [81, 219, 241, 260], \"label\": \"Text\"}, {\"bbox_2d\": [81, 281, 231, 306], \"label\": \"Text\"}, {\"bbox_2d\": [81, 327, 211, 352], \"label\": \"Text\"}, {\"bbox_2d\": [81, 373, 273, 414], \"label\": \"Text\"}, {\"bbox_2d\": [81, 434, 211, 460], \"label\": \"Text\"}, {\"bbox_2d\": [81, 481, 220, 506], \"label\": \"Text\"}, {\"bbox_2d\": [81, 527, 316, 583], \"label\": \"Text\"}, {\"bbox_2d\": [379, 112, 483, 137], \"label\": \"Text\"}, {\"bbox_2d\": [379, 158, 545, 183], \"label\": \"Text\"}, {\"bbox_2d\": [379, 204, 554, 229], \"label\": \"Text\"}, {\"bbox_2d\": [379, 250, 511, 276], \"label\": \"Text\"}, {\"bbox_2d\": [379, 296, 498, 322], \"label\": \"Text\"}, {\"bbox_2d\": [379, 342, 477, 368], \"label\": \"Text\"}, {\"bbox_2d\": [379, 388, 482, 414], \"label\": \"Text\"}, {\"bbox_2d\": [379, 434, 466, 460], \"label\": \"Text\"}, {\"bbox_2d\": [379, 481, 497, 506], \"label\": \"Text\"}, {\"bbox_2d\": [379, 527, 556, 552], \"label\": \"Text\"}, {\"bbox_2d\": [379, 573, 496, 598], \"label\": \"Text\"}, {\"bbox_2d\": [677, 112, 800, 137], \"label\": \"Text\"}, {\"bbox_2d\": [677, 158, 794, 183], \"label\": \"Text\"}, {\"bbox_2d\": [677, 204, 843, 245], \"label\": \"Text\"}, {\"bbox_2d\": [677, 265, 784, 291], \"label\": \"Text\"}, {\"bbox_2d\": [677, 311, 898, 399], \"label\": \"Text\"}, {\"bbox_2d\": [677, 419, 806, 445], \"label\": \"Text\"}, {\"bbox_2d\": [677, 465, 794, 491], \"label\": \"Text\"}, {\"bbox_2d\": [672, 511, 719, 520], \"label\": \"Text\"}, {\"bbox_2d\": [822, 1005, 955, 1013], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 220], \"label\": \"Text\"}, {\"bbox_2d\": [121, 233, 911, 439], \"label\": \"Text\"}, {\"bbox_2d\": [121, 452, 911, 795], \"label\": \"Text\"}, {\"bbox_2d\": [121, 808, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 911, 220], \"label\": \"Text\"}, {\"bbox_2d\": [121, 233, 911, 438], \"label\": \"Text\"}, {\"bbox_2d\": [121, 452, 911, 794], \"label\": \"Text\"}, {\"bbox_2d\": [121, 808, 911, 958], \"label\": \"Text\"}, {\"bbox_2d\": [501, 989, 511, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [88, 233, 474, 381], \"label\": \"Text\"}, {\"bbox_2d\": [88, 435, 477, 695], \"label\": \"Text\"}, {\"bbox_2d\": [88, 726, 467, 828], \"label\": \"Text\"}, {\"bbox_2d\": [513, 851, 899, 909], \"label\": \"Text\"}, {\"bbox_2d\": [513, 797, 884, 818], \"label\": \"Text\"}, {\"bbox_2d\": [513, 753, 899, 782], \"label\": \"Text\"}, {\"bbox_2d\": [513, 698, 899, 738], \"label\": \"Text\"}, {\"bbox_2d\": [513, 661, 904, 681], \"label\": \"Text\"}, {\"bbox_2d\": [513, 256, 877, 293], \"label\": \"Text\"}, {\"bbox_2d\": [513, 233, 760, 248], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 410, 271, 424], \"label\": \"Section-header\"}, {\"bbox_2d\": [1018, 141, 1027, 153], \"label\": \"Page-header\"}, {\"bbox_2d\": [237, 133, 906, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [523, 323, 906, 628], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [237, 126, 907, 158], \"label\": \"Page-header\"}, {\"bbox_2d\": [1016, 137, 1028, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [512, 231, 760, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 409, 272, 426], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 229, 473, 381], \"label\": \"Text\"}, {\"bbox_2d\": [87, 431, 479, 695], \"label\": \"Text\"}, {\"bbox_2d\": [87, 722, 469, 829], \"label\": \"Text\"}, {\"bbox_2d\": [512, 662, 904, 682], \"label\": \"Text\"}, {\"bbox_2d\": [512, 696, 902, 738], \"label\": \"Text\"}, {\"bbox_2d\": [512, 752, 897, 783], \"label\": \"Text\"}, {\"bbox_2d\": [512, 797, 887, 817], \"label\": \"Text\"}, {\"bbox_2d\": [512, 846, 898, 909], \"label\": \"Text\"}, {\"bbox_2d\": [512, 253, 876, 293], \"label\": \"Text\"}, {\"bbox_2d\": [519, 316, 906, 635], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 97, 470, 115], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 467, 415, 483], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 790, 411, 804], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 140, 926, 214], \"label\": \"Text\"}, {\"bbox_2d\": [241, 239, 914, 328], \"label\": \"Text\"}, {\"bbox_2d\": [241, 345, 675, 358], \"label\": \"Text\"}, {\"bbox_2d\": [231, 384, 926, 412], \"label\": \"Text\"}, {\"bbox_2d\": [231, 428, 914, 440], \"label\": \"Text\"}, {\"bbox_2d\": [231, 501, 653, 513], \"label\": \"Text\"}, {\"bbox_2d\": [241, 665, 915, 755], \"label\": \"Text\"}, {\"bbox_2d\": [231, 824, 648, 836], \"label\": \"Text\"}, {\"bbox_2d\": [231, 846, 698, 858], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 868, 752, 879], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 890, 909, 918], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 929, 871, 957], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 524, 535, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 546, 606, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 569, 787, 581], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 591, 701, 603], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 613, 908, 641], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 987, 693, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 97, 470, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 467, 415, 483], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 789, 411, 805], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 139, 926, 214], \"label\": \"Text\"}, {\"bbox_2d\": [241, 238, 916, 329], \"label\": \"Text\"}, {\"bbox_2d\": [241, 346, 680, 358], \"label\": \"Text\"}, {\"bbox_2d\": [231, 384, 925, 411], \"label\": \"Text\"}, {\"bbox_2d\": [231, 428, 913, 440], \"label\": \"Text\"}, {\"bbox_2d\": [231, 501, 653, 513], \"label\": \"Text\"}, {\"bbox_2d\": [241, 665, 929, 755], \"label\": \"Text\"}, {\"bbox_2d\": [231, 823, 648, 835], \"label\": \"Text\"}, {\"bbox_2d\": [231, 928, 875, 956], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 890, 911, 918], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 868, 753, 880], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 845, 698, 857], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 613, 911, 640], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 590, 699, 602], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 568, 787, 580], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 546, 606, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 524, 540, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [863, 1000, 973, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [642, 714, 816, 728], \"label\": \"Text\"}, {\"bbox_2d\": [271, 751, 652, 826], \"label\": \"Text\"}, {\"bbox_2d\": [91, 751, 361, 926], \"label\": \"Text\"}, {\"bbox_2d\": [91, 925, 360, 963], \"label\": \"Text\"}, {\"bbox_2d\": [673, 796, 938, 925], \"label\": \"Table\"}, {\"bbox_2d\": [381, 847, 633, 963], \"label\": \"Table\"}, {\"bbox_2d\": [672, 489, 937, 648], \"label\": \"Table\"}, {\"bbox_2d\": [381, 456, 646, 585], \"label\": \"Table\"}, {\"bbox_2d\": [381, 210, 641, 350], \"label\": \"Table\"}, {\"bbox_2d\": [672, 210, 937, 355], \"label\": \"Table\"}, {\"bbox_2d\": [644, 129, 705, 163], \"label\": \"Text\"}, {\"bbox_2d\": [381, 166, 651, 190], \"label\": \"Text\"}, {\"bbox_2d\": [91, 166, 360, 238], \"label\": \"Text\"}, {\"bbox_2d\": [91, 240, 360, 350], \"label\": \"Text\"}, {\"bbox_2d\": [91, 458, 361, 563], \"label\": \"Text\"}, {\"bbox_2d\": [91, 569, 360, 628], \"label\": \"Text\"}, {\"bbox_2d\": [381, 166, 651, 190], \"label\": \"Text\"}, {\"bbox_2d\": [91, 419, 206, 444], \"label\": \"Picture\"}, {\"bbox_2d\": [622, 417, 815, 428], \"label\": \"Text\"}, {\"bbox_2d\": [88, 707, 476, 738], \"label\": \"Picture\"}, {\"bbox_2d\": [381, 166, 651, 190], \"label\": \"Text\"}, {\"bbox_2d\": [672, 210, 937, 355], \"label\": \"Table\"}, {\"bbox_2d\": [381, 210, 641, 350], \"label\": \"Table\"}, {\"bbox_2d\": [672, 129, 937, 200], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 135, 554, 149], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 167, 362, 239], \"label\": \"Text\"}, {\"bbox_2d\": [91, 242, 365, 350], \"label\": \"Text\"}, {\"bbox_2d\": [91, 459, 363, 567], \"label\": \"Text\"}, {\"bbox_2d\": [91, 570, 367, 629], \"label\": \"Text\"}, {\"bbox_2d\": [91, 754, 366, 925], \"label\": \"Text\"}, {\"bbox_2d\": [91, 928, 362, 962], \"label\": \"Text\"}, {\"bbox_2d\": [382, 754, 652, 826], \"label\": \"Text\"}, {\"bbox_2d\": [382, 167, 653, 189], \"label\": \"Text\"}, {\"bbox_2d\": [86, 114, 555, 161], \"label\": \"Picture\"}, {\"bbox_2d\": [823, 108, 940, 200], \"label\": \"Picture\"}, {\"bbox_2d\": [823, 396, 940, 489], \"label\": \"Picture\"}, {\"bbox_2d\": [822, 696, 941, 786], \"label\": \"Picture\"}, {\"bbox_2d\": [670, 852, 941, 921], \"label\": \"Table\"}, {\"bbox_2d\": [863, 999, 977, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [671, 532, 941, 620], \"label\": \"Table\"}, {\"bbox_2d\": [671, 258, 941, 354], \"label\": \"Table\"}, {\"bbox_2d\": [670, 218, 939, 239], \"label\": \"Table\"}, {\"bbox_2d\": [672, 250, 864, 257], \"label\": \"Section-header\"}, {\"bbox_2d\": [672, 210, 822, 217], \"label\": \"Section-header\"}, {\"bbox_2d\": [670, 802, 940, 835], \"label\": \"Table\"}, {\"bbox_2d\": [671, 499, 941, 521], \"label\": \"Table\"}, {\"bbox_2d\": [89, 417, 208, 442], \"label\": \"Picture\"}, {\"bbox_2d\": [84, 698, 475, 744], \"label\": \"Picture\"}, {\"bbox_2d\": [731, 147, 814, 155], \"label\": \"Text\"}, {\"bbox_2d\": [381, 211, 601, 231], \"label\": \"Text\"}, {\"bbox_2d\": [388, 270, 587, 290], \"label\": \"Text\"}, {\"bbox_2d\": [388, 294, 595, 313], \"label\": \"Text\"}, {\"bbox_2d\": [381, 317, 639, 337], \"label\": \"Text\"}, {\"bbox_2d\": [381, 341, 543, 349], \"label\": \"Text\"}, {\"bbox_2d\": [381, 258, 451, 267], \"label\": \"Text\"}, {\"bbox_2d\": [381, 247, 574, 255], \"label\": \"Text\"}, {\"bbox_2d\": [381, 235, 585, 243], \"label\": \"Text\"}, {\"bbox_2d\": [382, 458, 637, 467], \"label\": \"Text\"}, {\"bbox_2d\": [382, 470, 649, 490], \"label\": \"Text\"}, {\"bbox_2d\": [382, 494, 578, 502], \"label\": \"Text\"}, {\"bbox_2d\": [382, 505, 452, 514], \"label\": \"Text\"}, {\"bbox_2d\": [388, 517, 627, 537], \"label\": \"Text\"}, {\"bbox_2d\": [388, 541, 636, 560], \"label\": \"Text\"}, {\"bbox_2d\": [382, 564, 578, 572], \"label\": \"Text\"}, {\"bbox_2d\": [382, 576, 544, 584], \"label\": \"Text\"}, {\"bbox_2d\": [381, 848, 636, 868], \"label\": \"Text\"}, {\"bbox_2d\": [381, 872, 521, 880], \"label\": \"Text\"}, {\"bbox_2d\": [381, 884, 559, 892], \"label\": \"Text\"}, {\"bbox_2d\": [381, 895, 451, 904], \"label\": \"Text\"}, {\"bbox_2d\": [387, 907, 660, 927], \"label\": \"Text\"}, {\"bbox_2d\": [381, 931, 639, 951], \"label\": \"Text\"}, {\"bbox_2d\": [381, 954, 543, 962], \"label\": \"Text\"}, {\"bbox_2d\": [672, 843, 864, 850], \"label\": \"Section-header\"}, {\"bbox_2d\": [672, 795, 822, 802], \"label\": \"Section-header\"}, {\"bbox_2d\": [645, 717, 814, 727], \"label\": \"Section-header\"}, {\"bbox_2d\": [672, 524, 865, 531], \"label\": \"Section-header\"}, {\"bbox_2d\": [672, 490, 822, 497], \"label\": \"Section-header\"}, {\"bbox_2d\": [623, 419, 814, 430], \"label\": \"Section-header\"}, {\"bbox_2d\": [663, 131, 814, 141], \"label\": \"Section-header\"}, {\"bbox_2d\": [672, 626, 791, 651], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [87, 108, 487, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 227, 563, 359], \"label\": \"Text\"}, {\"bbox_2d\": [49, 654, 973, 904], \"label\": \"Text\"}, {\"bbox_2d\": [23, 1011, 27, 1022], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [85, 114, 485, 198], \"label\": \"Picture\"}, {\"bbox_2d\": [93, 226, 566, 357], \"label\": \"Text\"}, {\"bbox_2d\": [49, 649, 991, 904], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [127, 153, 491, 189], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 201, 352, 215], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 218, 500, 454], \"label\": \"Text\"}, {\"bbox_2d\": [119, 458, 497, 500], \"label\": \"Text\"}, {\"bbox_2d\": [119, 772, 500, 859], \"label\": \"Text\"}, {\"bbox_2d\": [119, 893, 497, 949], \"label\": \"Text\"}, {\"bbox_2d\": [119, 756, 456, 769], \"label\": \"Section-header\"}, {\"bbox_2d\": [544, 574, 839, 588], \"label\": \"Section-header\"}, {\"bbox_2d\": [544, 218, 610, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [567, 257, 899, 282], \"label\": \"Section-header\"}, {\"bbox_2d\": [544, 151, 920, 194], \"label\": \"Text\"}, {\"bbox_2d\": [544, 286, 903, 350], \"label\": \"Text\"}, {\"bbox_2d\": [567, 354, 903, 541], \"label\": \"Text\"}, {\"bbox_2d\": [544, 589, 920, 633], \"label\": \"Text\"}, {\"bbox_2d\": [544, 636, 920, 722], \"label\": \"Text\"}, {\"bbox_2d\": [544, 726, 920, 766], \"label\": \"Text\"}, {\"bbox_2d\": [545, 796, 907, 952], \"label\": \"Picture\"}, {\"bbox_2d\": [567, 390, 697, 534], \"label\": \"Picture\"}, {\"bbox_2d\": [175, 517, 440, 540], \"label\": \"Picture\"}, {\"bbox_2d\": [134, 544, 485, 731], \"label\": \"Picture\"}, {\"bbox_2d\": [58, 999, 170, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [119, 876, 247, 889], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [120, 203, 353, 214], \"label\": \"Section-header\"}, {\"bbox_2d\": [554, 226, 611, 238], \"label\": \"Section-header\"}, {\"bbox_2d\": [544, 152, 924, 192], \"label\": \"Text\"}, {\"bbox_2d\": [120, 219, 500, 454], \"label\": \"Text\"}, {\"bbox_2d\": [120, 460, 500, 499], \"label\": \"Text\"}, {\"bbox_2d\": [120, 153, 494, 188], \"label\": \"Section-header\"}, {\"bbox_2d\": [563, 258, 901, 281], \"label\": \"Section-header\"}, {\"bbox_2d\": [544, 576, 840, 587], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 758, 456, 770], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 878, 248, 890], \"label\": \"Section-header\"}, {\"bbox_2d\": [58, 999, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [120, 895, 500, 949], \"label\": \"Text\"}, {\"bbox_2d\": [120, 775, 500, 859], \"label\": \"Text\"}, {\"bbox_2d\": [544, 728, 924, 767], \"label\": \"Text\"}, {\"bbox_2d\": [544, 638, 924, 722], \"label\": \"Text\"}, {\"bbox_2d\": [544, 593, 924, 632], \"label\": \"Text\"}, {\"bbox_2d\": [563, 286, 907, 350], \"label\": \"Text\"}, {\"bbox_2d\": [563, 354, 907, 377], \"label\": \"Text\"}, {\"bbox_2d\": [717, 382, 909, 541], \"label\": \"Text\"}, {\"bbox_2d\": [563, 388, 698, 536], \"label\": \"Picture\"}, {\"bbox_2d\": [133, 517, 486, 734], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 796, 905, 959], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [320, 83, 897, 98], \"label\": \"Text\"}, {\"bbox_2d\": [320, 103, 653, 116], \"label\": \"List-item\"}, {\"bbox_2d\": [320, 122, 587, 136], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 142, 924, 188], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 194, 926, 253], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 260, 924, 320], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 326, 913, 356], \"label\": \"Text\"}, {\"bbox_2d\": [293, 362, 926, 408], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 436, 801, 450], \"label\": \"Text\"}, {\"bbox_2d\": [293, 479, 915, 508], \"label\": \"Text\"}, {\"bbox_2d\": [293, 537, 912, 567], \"label\": \"Text\"}, {\"bbox_2d\": [293, 712, 917, 742], \"label\": \"Text\"}, {\"bbox_2d\": [293, 760, 937, 773], \"label\": \"Text\"}, {\"bbox_2d\": [293, 802, 934, 832], \"label\": \"Text\"}, {\"bbox_2d\": [293, 861, 923, 922], \"label\": \"Text\"}, {\"bbox_2d\": [293, 788, 444, 802], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 846, 492, 860], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 522, 404, 536], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 463, 332, 477], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 421, 336, 435], \"label\": \"Section-header\"}, {\"bbox_2d\": [346, 606, 790, 694], \"label\": \"Text\"}, {\"bbox_2d\": [293, 585, 392, 599], \"label\": \"Text\"}, {\"bbox_2d\": [293, 895, 913, 922], \"label\": \"Text\"}, {\"bbox_2d\": [639, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [320, 85, 897, 98], \"label\": \"Text\"}, {\"bbox_2d\": [293, 437, 798, 451], \"label\": \"Text\"}, {\"bbox_2d\": [293, 480, 910, 509], \"label\": \"Text\"}, {\"bbox_2d\": [293, 539, 907, 568], \"label\": \"Text\"}, {\"bbox_2d\": [293, 586, 392, 599], \"label\": \"Text\"}, {\"bbox_2d\": [293, 714, 914, 743], \"label\": \"Text\"}, {\"bbox_2d\": [293, 761, 931, 774], \"label\": \"Text\"}, {\"bbox_2d\": [293, 804, 928, 833], \"label\": \"Text\"}, {\"bbox_2d\": [293, 863, 914, 923], \"label\": \"Text\"}, {\"bbox_2d\": [640, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [345, 603, 796, 696], \"label\": \"Table\"}, {\"bbox_2d\": [293, 523, 403, 538], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 465, 332, 479], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 422, 335, 436], \"label\": \"Section-header\"}, {\"bbox_2d\": [320, 102, 653, 118], \"label\": \"List-item\"}, {\"bbox_2d\": [320, 122, 587, 137], \"label\": \"List-item\"}, {\"bbox_2d\": [320, 141, 920, 188], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 195, 925, 255], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 261, 923, 321], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 363, 920, 407], \"label\": \"List-item\"}, {\"bbox_2d\": [320, 328, 908, 357], \"label\": \"Text\"}, {\"bbox_2d\": [293, 847, 492, 862], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 788, 443, 803], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [19, 60, 222, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [15, 104, 431, 307], \"label\": \"Text\"}, {\"bbox_2d\": [19, 309, 427, 493], \"label\": \"Text\"}, {\"bbox_2d\": [15, 527, 427, 674], \"label\": \"Text\"}, {\"bbox_2d\": [19, 678, 427, 936], \"label\": \"Text\"}, {\"bbox_2d\": [33, 506, 427, 521], \"label\": \"Section-header\"}, {\"bbox_2d\": [34, 939, 427, 975], \"label\": \"List-item\"}, {\"bbox_2d\": [433, 985, 460, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [467, 808, 880, 974], \"label\": \"Text\"}, {\"bbox_2d\": [508, 735, 880, 805], \"label\": \"Text\"}, {\"bbox_2d\": [536, 677, 702, 717], \"label\": \"Formula\"}, {\"bbox_2d\": [467, 720, 522, 734], \"label\": \"Text\"}, {\"bbox_2d\": [467, 614, 880, 684], \"label\": \"Text\"}, {\"bbox_2d\": [467, 287, 880, 611], \"label\": \"Text\"}, {\"bbox_2d\": [486, 230, 880, 284], \"label\": \"List-item\"}, {\"bbox_2d\": [486, 141, 880, 229], \"label\": \"List-item\"}, {\"bbox_2d\": [486, 104, 880, 137], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [19, 60, 244, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [487, 102, 889, 137], \"label\": \"Text\"}, {\"bbox_2d\": [487, 138, 897, 229], \"label\": \"Text\"}, {\"bbox_2d\": [16, 103, 439, 307], \"label\": \"Text\"}, {\"bbox_2d\": [487, 230, 889, 283], \"label\": \"Text\"}, {\"bbox_2d\": [464, 285, 897, 612], \"label\": \"Text\"}, {\"bbox_2d\": [16, 308, 440, 494], \"label\": \"Text\"}, {\"bbox_2d\": [15, 527, 436, 675], \"label\": \"Text\"}, {\"bbox_2d\": [466, 612, 888, 684], \"label\": \"Text\"}, {\"bbox_2d\": [466, 717, 529, 734], \"label\": \"Text\"}, {\"bbox_2d\": [512, 734, 761, 752], \"label\": \"List-item\"}, {\"bbox_2d\": [508, 753, 894, 788], \"label\": \"List-item\"}, {\"bbox_2d\": [512, 788, 773, 806], \"label\": \"List-item\"}, {\"bbox_2d\": [14, 676, 437, 936], \"label\": \"Text\"}, {\"bbox_2d\": [466, 807, 898, 975], \"label\": \"Text\"}, {\"bbox_2d\": [34, 939, 431, 975], \"label\": \"Text\"}, {\"bbox_2d\": [15, 503, 438, 521], \"label\": \"Section-header\"}, {\"bbox_2d\": [647, 689, 685, 706], \"label\": \"Formula\"}, {\"bbox_2d\": [434, 986, 467, 1000], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [34, 200, 61, 660], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 93, 911, 110], \"label\": \"Title\"}, {\"bbox_2d\": [342, 139, 690, 153], \"label\": \"Text\"}, {\"bbox_2d\": [288, 168, 744, 209], \"label\": \"Text\"}, {\"bbox_2d\": [408, 226, 623, 240], \"label\": \"Text\"}, {\"bbox_2d\": [121, 289, 911, 557], \"label\": \"Text\"}, {\"bbox_2d\": [121, 587, 445, 599], \"label\": \"Text\"}, {\"bbox_2d\": [471, 257, 562, 273], \"label\": \"Section-header\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 94, 911, 110], \"label\": \"Title\"}, {\"bbox_2d\": [342, 139, 690, 154], \"label\": \"Text\"}, {\"bbox_2d\": [288, 168, 744, 211], \"label\": \"Text\"}, {\"bbox_2d\": [409, 225, 623, 240], \"label\": \"Text\"}, {\"bbox_2d\": [121, 288, 911, 558], \"label\": \"Text\"}, {\"bbox_2d\": [121, 586, 444, 599], \"label\": \"Text\"}, {\"bbox_2d\": [31, 199, 61, 771], \"label\": \"Page-header\"}, {\"bbox_2d\": [471, 257, 562, 275], \"label\": \"Section-header\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [97, 58, 752, 97], \"label\": \"Section-header\"}, {\"bbox_2d\": [95, 965, 377, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [89, 127, 844, 889], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [94, 51, 748, 99], \"label\": \"Section-header\"}, {\"bbox_2d\": [93, 131, 846, 894], \"label\": \"Table\"}, {\"bbox_2d\": [92, 964, 375, 983], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [718, 71, 918, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [466, 990, 502, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [340, 895, 634, 913], \"label\": \"Caption\"}, {\"bbox_2d\": [138, 141, 844, 872], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [139, 141, 855, 876], \"label\": \"Picture\"}, {\"bbox_2d\": [338, 893, 635, 909], \"label\": \"Caption\"}, {\"bbox_2d\": [469, 987, 501, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [716, 68, 929, 99], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [104, 65, 847, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 662, 170, 687], \"label\": \"Text\"}, {\"bbox_2d\": [116, 691, 264, 872], \"label\": \"Text\"}, {\"bbox_2d\": [714, 915, 844, 975], \"label\": \"Text\"}, {\"bbox_2d\": [894, 915, 914, 930], \"label\": \"Page-footer\"}, {\"bbox_2d\": [894, 964, 999, 1016], \"label\": \"Picture\"}, {\"bbox_2d\": [207, 119, 829, 961], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [103, 69, 842, 100], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 686, 118, 954], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 829, 155, 954], \"label\": \"Section-header\"}, {\"bbox_2d\": [175, 128, 192, 926], \"label\": \"Text\"}, {\"bbox_2d\": [854, 618, 871, 955], \"label\": \"Text\"}, {\"bbox_2d\": [207, 117, 833, 955], \"label\": \"Table\"}, {\"bbox_2d\": [895, 948, 989, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [291, 84, 790, 98], \"label\": \"Caption\"}, {\"bbox_2d\": [291, 99, 934, 509], \"label\": \"Table\"}, {\"bbox_2d\": [121, 538, 369, 559], \"label\": \"Section-header\"}, {\"bbox_2d\": [291, 577, 890, 607], \"label\": \"Text\"}, {\"bbox_2d\": [291, 615, 748, 655], \"label\": \"Text\"}, {\"bbox_2d\": [291, 664, 508, 678], \"label\": \"Caption\"}, {\"bbox_2d\": [291, 679, 934, 860], \"label\": \"Table\"}, {\"bbox_2d\": [889, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [693, 979, 883, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [576, 415, 923, 441], \"label\": \"List-item\"}, {\"bbox_2d\": [576, 448, 923, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [576, 494, 816, 504], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 411, 542, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 393, 542, 411], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 364, 549, 392], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 336, 524, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 300, 533, 318], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 245, 542, 258], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 201, 505, 213], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 172, 511, 199], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 127, 457, 199], \"label\": \"List-item\"}, {\"bbox_2d\": [559, 100, 704, 112], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [293, 86, 792, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 98, 931, 508], \"label\": \"Table\"}, {\"bbox_2d\": [292, 677, 932, 859], \"label\": \"Table\"}, {\"bbox_2d\": [293, 666, 510, 677], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 578, 891, 607], \"label\": \"List-item\"}, {\"bbox_2d\": [404, 617, 748, 631], \"label\": \"Text\"}, {\"bbox_2d\": [404, 641, 746, 654], \"label\": \"Text\"}, {\"bbox_2d\": [121, 540, 369, 557], \"label\": \"Section-header\"}, {\"bbox_2d\": [693, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 38, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 951, 165], \"label\": \"Text\"}, {\"bbox_2d\": [536, 165, 951, 193], \"label\": \"Text\"}, {\"bbox_2d\": [536, 251, 951, 300], \"label\": \"Text\"}, {\"bbox_2d\": [91, 112, 506, 155], \"label\": \"Text\"}, {\"bbox_2d\": [91, 158, 506, 471], \"label\": \"Text\"}, {\"bbox_2d\": [536, 338, 951, 505], \"label\": \"Text\"}, {\"bbox_2d\": [536, 508, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [91, 582, 506, 629], \"label\": \"Text\"}, {\"bbox_2d\": [125, 494, 506, 583], \"label\": \"Formula\"}, {\"bbox_2d\": [125, 697, 506, 730], \"label\": \"Formula\"}, {\"bbox_2d\": [125, 751, 506, 784], \"label\": \"Formula\"}, {\"bbox_2d\": [125, 805, 506, 828], \"label\": \"Formula\"}, {\"bbox_2d\": [172, 841, 428, 859], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 868, 138, 880], \"label\": \"Text\"}, {\"bbox_2d\": [125, 895, 506, 947], \"label\": \"Formula\"}, {\"bbox_2d\": [546, 202, 951, 235], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 282, 951, 333], \"label\": \"Formula\"}, {\"bbox_2d\": [125, 67, 477, 104], \"label\": \"Formula\"}, {\"bbox_2d\": [231, 367, 366, 403], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [943, 38, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 68, 472, 108], \"label\": \"Formula\"}, {\"bbox_2d\": [543, 201, 951, 247], \"label\": \"Formula\"}, {\"bbox_2d\": [233, 370, 364, 399], \"label\": \"Formula\"}, {\"bbox_2d\": [132, 485, 506, 579], \"label\": \"Formula\"}, {\"bbox_2d\": [169, 639, 428, 671], \"label\": \"Formula\"}, {\"bbox_2d\": [123, 697, 474, 728], \"label\": \"Formula\"}, {\"bbox_2d\": [136, 750, 461, 782], \"label\": \"Formula\"}, {\"bbox_2d\": [117, 804, 479, 828], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 841, 425, 865], \"label\": \"Formula\"}, {\"bbox_2d\": [122, 894, 506, 956], \"label\": \"Formula\"}, {\"bbox_2d\": [619, 303, 951, 335], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 112, 506, 155], \"label\": \"Text\"}, {\"bbox_2d\": [91, 159, 515, 358], \"label\": \"Text\"}, {\"bbox_2d\": [91, 410, 525, 471], \"label\": \"Text\"}, {\"bbox_2d\": [91, 584, 506, 628], \"label\": \"Text\"}, {\"bbox_2d\": [91, 870, 136, 882], \"label\": \"Text\"}, {\"bbox_2d\": [536, 250, 951, 294], \"label\": \"Text\"}, {\"bbox_2d\": [536, 72, 951, 162], \"label\": \"Text\"}, {\"bbox_2d\": [536, 166, 951, 193], \"label\": \"Text\"}, {\"bbox_2d\": [536, 340, 961, 508], \"label\": \"Text\"}, {\"bbox_2d\": [536, 512, 952, 946], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [123, 88, 505, 331], \"label\": \"Text\"}, {\"bbox_2d\": [123, 335, 506, 403], \"label\": \"Text\"}, {\"bbox_2d\": [123, 406, 505, 535], \"label\": \"Text\"}, {\"bbox_2d\": [123, 538, 505, 590], \"label\": \"Text\"}, {\"bbox_2d\": [123, 594, 505, 900], \"label\": \"Text\"}, {\"bbox_2d\": [123, 904, 505, 926], \"label\": \"Text\"}, {\"bbox_2d\": [555, 88, 937, 170], \"label\": \"Text\"}, {\"bbox_2d\": [555, 174, 937, 227], \"label\": \"Text\"}, {\"bbox_2d\": [555, 231, 937, 447], \"label\": \"Text\"}, {\"bbox_2d\": [555, 451, 937, 926], \"label\": \"Text\"}, {\"bbox_2d\": [521, 989, 535, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [464, 54, 595, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [306, 54, 324, 70], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [305, 55, 327, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [463, 54, 595, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [736, 55, 758, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 84, 523, 330], \"label\": \"Text\"}, {\"bbox_2d\": [123, 331, 512, 402], \"label\": \"Text\"}, {\"bbox_2d\": [123, 404, 520, 533], \"label\": \"Text\"}, {\"bbox_2d\": [123, 534, 513, 591], \"label\": \"Text\"}, {\"bbox_2d\": [123, 592, 519, 896], \"label\": \"Text\"}, {\"bbox_2d\": [123, 897, 516, 925], \"label\": \"Text\"}, {\"bbox_2d\": [554, 84, 956, 170], \"label\": \"Text\"}, {\"bbox_2d\": [554, 171, 950, 228], \"label\": \"Text\"}, {\"bbox_2d\": [554, 230, 952, 446], \"label\": \"Text\"}, {\"bbox_2d\": [554, 447, 963, 693], \"label\": \"Text\"}, {\"bbox_2d\": [554, 694, 959, 925], \"label\": \"Text\"}, {\"bbox_2d\": [525, 989, 533, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [340, 40, 961, 74], \"label\": \"Section-header\"}, {\"bbox_2d\": [110, 119, 928, 238], \"label\": \"Text\"}, {\"bbox_2d\": [110, 292, 930, 457], \"label\": \"Text\"}, {\"bbox_2d\": [111, 472, 926, 576], \"label\": \"Text\"}, {\"bbox_2d\": [110, 591, 930, 757], \"label\": \"Text\"}, {\"bbox_2d\": [111, 772, 929, 967], \"label\": \"Text\"}, {\"bbox_2d\": [777, 988, 973, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [116, 260, 176, 276], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [128, 43, 958, 74], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 263, 174, 275], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 293, 926, 455], \"label\": \"Text\"}, {\"bbox_2d\": [116, 474, 923, 576], \"label\": \"Text\"}, {\"bbox_2d\": [116, 594, 939, 756], \"label\": \"Text\"}, {\"bbox_2d\": [116, 774, 937, 967], \"label\": \"Text\"}, {\"bbox_2d\": [778, 990, 971, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [116, 121, 937, 238], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 686], \"label\": \"Text\"}, {\"bbox_2d\": [121, 698, 911, 958], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 685], \"label\": \"Text\"}, {\"bbox_2d\": [121, 697, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [127, 48, 568, 84], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 106, 295, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 267, 451, 282], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 472, 320, 486], \"label\": \"Section-header\"}, {\"bbox_2d\": [749, 987, 975, 998], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 130, 951, 180], \"label\": \"Table\"}, {\"bbox_2d\": [121, 188, 938, 242], \"label\": \"Text\"}, {\"bbox_2d\": [121, 293, 326, 304], \"label\": \"Text\"}, {\"bbox_2d\": [121, 352, 966, 392], \"label\": \"Text\"}, {\"bbox_2d\": [121, 406, 954, 446], \"label\": \"Text\"}, {\"bbox_2d\": [121, 498, 952, 566], \"label\": \"Table\"}, {\"bbox_2d\": [121, 572, 927, 612], \"label\": \"Text\"}, {\"bbox_2d\": [121, 626, 969, 679], \"label\": \"Text\"}, {\"bbox_2d\": [121, 694, 673, 705], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [130, 45, 568, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 70, 610, 85], \"label\": \"Text\"}, {\"bbox_2d\": [121, 188, 947, 242], \"label\": \"Text\"}, {\"bbox_2d\": [121, 291, 325, 305], \"label\": \"Text\"}, {\"bbox_2d\": [121, 350, 976, 391], \"label\": \"Text\"}, {\"bbox_2d\": [121, 404, 963, 445], \"label\": \"Text\"}, {\"bbox_2d\": [121, 571, 936, 611], \"label\": \"Text\"}, {\"bbox_2d\": [121, 624, 980, 678], \"label\": \"Text\"}, {\"bbox_2d\": [121, 691, 676, 705], \"label\": \"Text\"}, {\"bbox_2d\": [749, 987, 976, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 494, 325, 508], \"label\": \"Text\"}, {\"bbox_2d\": [121, 468, 321, 486], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 508, 954, 566], \"label\": \"Table\"}, {\"bbox_2d\": [118, 307, 953, 345], \"label\": \"Table\"}, {\"bbox_2d\": [119, 145, 952, 182], \"label\": \"Table\"}, {\"bbox_2d\": [121, 102, 296, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 129, 325, 142], \"label\": \"Text\"}, {\"bbox_2d\": [121, 265, 453, 283], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 275], \"label\": \"Text\"}, {\"bbox_2d\": [121, 287, 911, 356], \"label\": \"Text\"}, {\"bbox_2d\": [121, 427, 911, 579], \"label\": \"Text\"}, {\"bbox_2d\": [121, 683, 911, 725], \"label\": \"Text\"}, {\"bbox_2d\": [121, 836, 911, 960], \"label\": \"Text\"}, {\"bbox_2d\": [171, 622, 911, 664], \"label\": \"Formula\"}, {\"bbox_2d\": [466, 743, 911, 817], \"label\": \"Formula\"}, {\"bbox_2d\": [499, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [288, 377, 911, 408], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 911, 274], \"label\": \"Text\"}, {\"bbox_2d\": [121, 288, 922, 356], \"label\": \"Text\"}, {\"bbox_2d\": [121, 428, 922, 579], \"label\": \"Text\"}, {\"bbox_2d\": [121, 684, 911, 725], \"label\": \"Text\"}, {\"bbox_2d\": [121, 837, 934, 961], \"label\": \"Text\"}, {\"bbox_2d\": [501, 989, 511, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [292, 381, 911, 420], \"label\": \"Formula\"}, {\"bbox_2d\": [173, 622, 911, 668], \"label\": \"Formula\"}, {\"bbox_2d\": [478, 744, 911, 779], \"label\": \"Formula\"}, {\"bbox_2d\": [471, 783, 911, 819], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [394, 83, 918, 176], \"label\": \"Text\"}, {\"bbox_2d\": [333, 185, 469, 198], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 320, 384, 333], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 679, 424, 692], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 827, 474, 842], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 714, 877, 731], \"label\": \"Text\"}, {\"bbox_2d\": [333, 743, 523, 755], \"label\": \"Text\"}, {\"bbox_2d\": [333, 852, 917, 896], \"label\": \"Text\"}, {\"bbox_2d\": [333, 907, 523, 919], \"label\": \"Text\"}, {\"bbox_2d\": [333, 656, 733, 668], \"label\": \"Text\"}, {\"bbox_2d\": [333, 585, 905, 630], \"label\": \"Text\"}, {\"bbox_2d\": [333, 641, 428, 653], \"label\": \"Text\"}, {\"bbox_2d\": [333, 499, 404, 511], \"label\": \"Text\"}, {\"bbox_2d\": [333, 547, 674, 559], \"label\": \"Text\"}, {\"bbox_2d\": [333, 569, 414, 582], \"label\": \"Section-header\"}, {\"bbox_2d\": [333, 445, 904, 489], \"label\": \"Section-header\"}, {\"bbox_2d\": [333, 406, 888, 434], \"label\": \"Section-header\"}, {\"bbox_2d\": [333, 383, 523, 395], \"label\": \"Text\"}, {\"bbox_2d\": [333, 344, 863, 372], \"label\": \"Text\"}, {\"bbox_2d\": [333, 297, 558, 309], \"label\": \"Text\"}, {\"bbox_2d\": [394, 202, 905, 278], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 384, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [395, 85, 912, 176], \"label\": \"Text\"}, {\"bbox_2d\": [395, 202, 900, 278], \"label\": \"Text\"}, {\"bbox_2d\": [334, 296, 553, 310], \"label\": \"Text\"}, {\"bbox_2d\": [334, 343, 858, 372], \"label\": \"Text\"}, {\"bbox_2d\": [334, 382, 516, 396], \"label\": \"Text\"}, {\"bbox_2d\": [395, 516, 745, 529], \"label\": \"Text\"}, {\"bbox_2d\": [334, 547, 668, 560], \"label\": \"Text\"}, {\"bbox_2d\": [395, 586, 898, 631], \"label\": \"Text\"}, {\"bbox_2d\": [395, 657, 726, 670], \"label\": \"Text\"}, {\"bbox_2d\": [334, 704, 872, 732], \"label\": \"Text\"}, {\"bbox_2d\": [334, 743, 516, 756], \"label\": \"Text\"}, {\"bbox_2d\": [395, 805, 773, 819], \"label\": \"Text\"}, {\"bbox_2d\": [334, 852, 911, 897], \"label\": \"Text\"}, {\"bbox_2d\": [334, 907, 516, 920], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [334, 187, 464, 200], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 320, 377, 333], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 500, 397, 513], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 571, 407, 584], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 641, 422, 654], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 790, 430, 803], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 829, 468, 842], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 680, 419, 693], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 766, 896, 779], \"label\": \"List-item\"}, {\"bbox_2d\": [334, 445, 902, 490], \"label\": \"List-item\"}, {\"bbox_2d\": [334, 406, 884, 435], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [891, 49, 911, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [147, 95, 591, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 179, 911, 249], \"label\": \"Text\"}, {\"bbox_2d\": [121, 260, 911, 331], \"label\": \"Text\"}, {\"bbox_2d\": [121, 399, 625, 414], \"label\": \"Text\"}, {\"bbox_2d\": [121, 482, 458, 498], \"label\": \"Text\"}, {\"bbox_2d\": [121, 567, 911, 691], \"label\": \"Text\"}, {\"bbox_2d\": [121, 703, 911, 856], \"label\": \"Text\"}, {\"bbox_2d\": [121, 868, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [273, 351, 911, 383], \"label\": \"Formula\"}, {\"bbox_2d\": [353, 437, 911, 460], \"label\": \"Formula\"}, {\"bbox_2d\": [324, 521, 911, 543], \"label\": \"Formula\"}, {\"bbox_2d\": [299, 517, 707, 537], \"label\": \"Formula\"}, {\"bbox_2d\": [273, 442, 911, 470], \"label\": \"Formula\"}, {\"bbox_2d\": [273, 351, 911, 383], \"label\": \"Formula\"}, {\"bbox_2d\": [273, 127, 911, 170], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [891, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [147, 95, 589, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 261, 910, 304], \"label\": \"Text\"}, {\"bbox_2d\": [121, 315, 277, 331], \"label\": \"Text\"}, {\"bbox_2d\": [121, 399, 625, 415], \"label\": \"Text\"}, {\"bbox_2d\": [121, 481, 458, 498], \"label\": \"Text\"}, {\"bbox_2d\": [121, 566, 910, 582], \"label\": \"Text\"}, {\"bbox_2d\": [121, 589, 911, 691], \"label\": \"Text\"}, {\"bbox_2d\": [121, 703, 911, 773], \"label\": \"Text\"}, {\"bbox_2d\": [121, 785, 911, 855], \"label\": \"Text\"}, {\"bbox_2d\": [121, 867, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [121, 178, 911, 249], \"label\": \"Text\"}, {\"bbox_2d\": [297, 127, 911, 174], \"label\": \"Formula\"}, {\"bbox_2d\": [302, 347, 911, 394], \"label\": \"Formula\"}, {\"bbox_2d\": [382, 438, 911, 468], \"label\": \"Formula\"}, {\"bbox_2d\": [325, 522, 911, 570], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 88, 928, 157], \"label\": \"Title\"}, {\"bbox_2d\": [87, 168, 214, 186], \"label\": \"Text\"}, {\"bbox_2d\": [87, 197, 347, 215], \"label\": \"Text\"}, {\"bbox_2d\": [87, 225, 156, 243], \"label\": \"Text\"}, {\"bbox_2d\": [87, 253, 922, 303], \"label\": \"Text\"}, {\"bbox_2d\": [87, 312, 160, 330], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 339, 405, 358], \"label\": \"Text\"}, {\"bbox_2d\": [87, 367, 111, 381], \"label\": \"Text\"}, {\"bbox_2d\": [87, 391, 850, 410], \"label\": \"Text\"}, {\"bbox_2d\": [87, 410, 681, 428], \"label\": \"Text\"}, {\"bbox_2d\": [87, 442, 610, 461], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 461, 694, 481], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 481, 511, 503], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 503, 938, 539], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 541, 204, 560], \"label\": \"Text\"}, {\"bbox_2d\": [87, 560, 561, 619], \"label\": \"Text\"}, {\"bbox_2d\": [105, 619, 230, 637], \"label\": \"Text\"}, {\"bbox_2d\": [87, 637, 378, 674], \"label\": \"Text\"}, {\"bbox_2d\": [87, 682, 160, 701], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 710, 536, 730], \"label\": \"Text\"}, {\"bbox_2d\": [485, 1003, 585, 1019], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 85, 929, 158], \"label\": \"Title\"}, {\"bbox_2d\": [87, 167, 215, 186], \"label\": \"Text\"}, {\"bbox_2d\": [87, 195, 347, 215], \"label\": \"Text\"}, {\"bbox_2d\": [87, 224, 156, 243], \"label\": \"Text\"}, {\"bbox_2d\": [87, 252, 922, 301], \"label\": \"Text\"}, {\"bbox_2d\": [87, 339, 405, 359], \"label\": \"Text\"}, {\"bbox_2d\": [87, 364, 113, 383], \"label\": \"Text\"}, {\"bbox_2d\": [87, 392, 851, 441], \"label\": \"Text\"}, {\"bbox_2d\": [87, 711, 536, 731], \"label\": \"Text\"}, {\"bbox_2d\": [91, 443, 262, 462], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 464, 693, 483], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 484, 512, 504], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 505, 938, 540], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 541, 560, 596], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 598, 230, 638], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 640, 378, 674], \"label\": \"Text\"}, {\"bbox_2d\": [87, 683, 160, 702], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 310, 160, 330], \"label\": \"Section-header\"}, {\"bbox_2d\": [485, 1003, 585, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [48, 122, 450, 342], \"label\": \"Text\"}, {\"bbox_2d\": [48, 347, 447, 453], \"label\": \"Text\"}, {\"bbox_2d\": [48, 460, 450, 661], \"label\": \"Text\"}, {\"bbox_2d\": [48, 666, 450, 895], \"label\": \"Text\"}, {\"bbox_2d\": [48, 900, 447, 976], \"label\": \"Text\"}, {\"bbox_2d\": [485, 122, 902, 305], \"label\": \"Text\"}, {\"bbox_2d\": [485, 310, 902, 778], \"label\": \"Text\"}, {\"bbox_2d\": [485, 784, 902, 979], \"label\": \"Text\"}, {\"bbox_2d\": [450, 989, 480, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [37, 62, 322, 107], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [37, 60, 333, 108], \"label\": \"Page-header\"}, {\"bbox_2d\": [486, 118, 914, 305], \"label\": \"Text\"}, {\"bbox_2d\": [51, 118, 464, 342], \"label\": \"Text\"}, {\"bbox_2d\": [51, 344, 461, 455], \"label\": \"Text\"}, {\"bbox_2d\": [486, 306, 913, 777], \"label\": \"Text\"}, {\"bbox_2d\": [52, 456, 456, 661], \"label\": \"Text\"}, {\"bbox_2d\": [50, 662, 459, 895], \"label\": \"Text\"}, {\"bbox_2d\": [485, 780, 909, 978], \"label\": \"Text\"}, {\"bbox_2d\": [32, 899, 456, 976], \"label\": \"Text\"}, {\"bbox_2d\": [450, 985, 490, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 920, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 179, 437, 194], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 368, 389, 386], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 215, 920, 257], \"label\": \"Text\"}, {\"bbox_2d\": [231, 274, 920, 302], \"label\": \"Text\"}, {\"bbox_2d\": [231, 319, 874, 330], \"label\": \"Text\"}, {\"bbox_2d\": [231, 411, 920, 438], \"label\": \"Text\"}, {\"bbox_2d\": [231, 455, 920, 497], \"label\": \"Text\"}, {\"bbox_2d\": [231, 515, 910, 558], \"label\": \"Text\"}, {\"bbox_2d\": [231, 575, 589, 586], \"label\": \"Text\"}, {\"bbox_2d\": [255, 620, 809, 631], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 643, 561, 654], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 665, 910, 691], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 703, 843, 714], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 748, 535, 759], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 794, 789, 804], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 816, 908, 841], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 852, 906, 863], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 875, 920, 901], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 913, 907, 955], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 598, 485, 609], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 723, 736, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 770, 524, 780], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 852, 920, 955], \"label\": \"List-item\"}, {\"bbox_2d\": [747, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 925, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 179, 438, 195], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 367, 389, 387], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 213, 926, 257], \"label\": \"Text\"}, {\"bbox_2d\": [231, 274, 925, 301], \"label\": \"Text\"}, {\"bbox_2d\": [231, 318, 873, 330], \"label\": \"Text\"}, {\"bbox_2d\": [231, 410, 926, 437], \"label\": \"Text\"}, {\"bbox_2d\": [231, 454, 926, 498], \"label\": \"Text\"}, {\"bbox_2d\": [231, 514, 917, 558], \"label\": \"Text\"}, {\"bbox_2d\": [231, 574, 589, 586], \"label\": \"Text\"}, {\"bbox_2d\": [231, 597, 485, 609], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 619, 808, 631], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 641, 572, 653], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 663, 926, 691], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 701, 840, 713], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 746, 536, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 768, 534, 780], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 790, 803, 802], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 813, 923, 840], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 850, 906, 862], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 873, 926, 900], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 911, 911, 954], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 724, 736, 736], \"label\": \"Text\"}, {\"bbox_2d\": [747, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [238, 125, 906, 157], \"label\": \"Page-header\"}, {\"bbox_2d\": [1017, 139, 1029, 154], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 229, 333, 248], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 276, 221, 293], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 298, 283, 314], \"label\": \"Text\"}, {\"bbox_2d\": [513, 231, 902, 382], \"label\": \"Text\"}, {\"bbox_2d\": [513, 432, 877, 493], \"label\": \"Text\"}, {\"bbox_2d\": [513, 522, 888, 560], \"label\": \"Text\"}, {\"bbox_2d\": [513, 655, 899, 873], \"label\": \"Text\"}, {\"bbox_2d\": [88, 747, 479, 853], \"label\": \"Text\"}, {\"bbox_2d\": [88, 881, 477, 920], \"label\": \"Text\"}, {\"bbox_2d\": [513, 588, 774, 605], \"label\": \"List-item\"}, {\"bbox_2d\": [513, 611, 884, 627], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 724, 282, 741], \"label\": \"Section-header\"}, {\"bbox_2d\": [97, 344, 481, 699], \"label\": \"Table\"}, {\"bbox_2d\": [513, 407, 699, 425], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [1016, 137, 1028, 155], \"label\": \"Page-footer\"}, {\"bbox_2d\": [237, 126, 907, 158], \"label\": \"Page-footer\"}, {\"bbox_2d\": [512, 229, 900, 381], \"label\": \"Text\"}, {\"bbox_2d\": [512, 431, 877, 493], \"label\": \"Text\"}, {\"bbox_2d\": [512, 520, 887, 560], \"label\": \"Text\"}, {\"bbox_2d\": [512, 654, 899, 874], \"label\": \"Text\"}, {\"bbox_2d\": [87, 879, 478, 920], \"label\": \"Text\"}, {\"bbox_2d\": [87, 745, 479, 853], \"label\": \"Text\"}, {\"bbox_2d\": [87, 298, 283, 315], \"label\": \"Text\"}, {\"bbox_2d\": [87, 276, 220, 293], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 231, 333, 248], \"label\": \"Section-header\"}, {\"bbox_2d\": [512, 409, 700, 426], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 724, 281, 741], \"label\": \"Section-header\"}, {\"bbox_2d\": [512, 587, 774, 605], \"label\": \"List-item\"}, {\"bbox_2d\": [512, 610, 884, 628], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 340, 481, 697], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [86, 28, 438, 60], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 64, 627, 76], \"label\": \"Text\"}, {\"bbox_2d\": [86, 94, 963, 186], \"label\": \"Text\"}, {\"bbox_2d\": [86, 201, 906, 230], \"label\": \"Text\"}, {\"bbox_2d\": [86, 245, 696, 258], \"label\": \"Text\"}, {\"bbox_2d\": [86, 431, 433, 444], \"label\": \"Text\"}, {\"bbox_2d\": [86, 415, 197, 429], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 582, 188, 595], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 600, 960, 676], \"label\": \"Text\"}, {\"bbox_2d\": [86, 689, 957, 721], \"label\": \"Text\"}, {\"bbox_2d\": [86, 734, 960, 796], \"label\": \"Text\"}, {\"bbox_2d\": [86, 810, 880, 822], \"label\": \"Text\"}, {\"bbox_2d\": [86, 836, 959, 882], \"label\": \"Text\"}, {\"bbox_2d\": [86, 1107, 964, 1118], \"label\": \"Page-footer\"}, {\"bbox_2d\": [86, 272, 964, 399], \"label\": \"Table\"}, {\"bbox_2d\": [86, 458, 963, 566], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [802, 996, 965, 1006], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 22, 439, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 62, 628, 78], \"label\": \"Text\"}, {\"bbox_2d\": [85, 92, 968, 187], \"label\": \"Text\"}, {\"bbox_2d\": [85, 199, 911, 230], \"label\": \"Text\"}, {\"bbox_2d\": [85, 243, 696, 258], \"label\": \"Text\"}, {\"bbox_2d\": [85, 411, 197, 430], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 429, 434, 445], \"label\": \"Text\"}, {\"bbox_2d\": [85, 579, 189, 598], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 597, 964, 676], \"label\": \"Text\"}, {\"bbox_2d\": [85, 689, 960, 720], \"label\": \"Text\"}, {\"bbox_2d\": [85, 732, 962, 795], \"label\": \"Text\"}, {\"bbox_2d\": [85, 808, 880, 823], \"label\": \"Text\"}, {\"bbox_2d\": [85, 836, 962, 883], \"label\": \"Text\"}, {\"bbox_2d\": [84, 455, 968, 569], \"label\": \"Table\"}, {\"bbox_2d\": [86, 266, 967, 400], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [119, 98, 926, 617], \"label\": \"Picture\"}, {\"bbox_2d\": [119, 643, 911, 774], \"label\": \"Caption\"}, {\"bbox_2d\": [119, 796, 911, 953], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [115, 97, 927, 618], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 628, 911, 771], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 800, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 164], \"label\": \"Text\"}, {\"bbox_2d\": [121, 178, 911, 356], \"label\": \"Text\"}, {\"bbox_2d\": [121, 369, 911, 685], \"label\": \"Text\"}, {\"bbox_2d\": [121, 774, 911, 952], \"label\": \"Text\"}, {\"bbox_2d\": [121, 729, 298, 742], \"label\": \"Section-header\"}, {\"bbox_2d\": [501, 988, 511, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 911, 165], \"label\": \"Text\"}, {\"bbox_2d\": [121, 178, 911, 356], \"label\": \"Text\"}, {\"bbox_2d\": [121, 370, 922, 685], \"label\": \"Text\"}, {\"bbox_2d\": [121, 775, 911, 953], \"label\": \"Text\"}, {\"bbox_2d\": [121, 730, 298, 743], \"label\": \"Section-header\"}, {\"bbox_2d\": [501, 989, 511, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [504, 92, 523, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [787, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [675, 143, 746, 153], \"label\": \"Text\"}, {\"bbox_2d\": [675, 155, 786, 164], \"label\": \"Text\"}, {\"bbox_2d\": [206, 165, 405, 302], \"label\": \"Picture\"}, {\"bbox_2d\": [659, 168, 738, 287], \"label\": \"Picture\"}, {\"bbox_2d\": [675, 508, 804, 518], \"label\": \"Text\"}, {\"bbox_2d\": [659, 520, 804, 530], \"label\": \"Text\"}, {\"bbox_2d\": [659, 532, 804, 566], \"label\": \"Text\"}, {\"bbox_2d\": [659, 570, 804, 600], \"label\": \"Picture\"}, {\"bbox_2d\": [659, 602, 804, 632], \"label\": \"Picture\"}, {\"bbox_2d\": [659, 634, 804, 753], \"label\": \"Picture\"}, {\"bbox_2d\": [659, 755, 804, 861], \"label\": \"Picture\"}, {\"bbox_2d\": [659, 751, 674, 760], \"label\": \"Text\"}, {\"bbox_2d\": [130, 333, 503, 735], \"label\": \"Text\"}, {\"bbox_2d\": [130, 717, 503, 778], \"label\": \"Text\"}, {\"bbox_2d\": [130, 793, 503, 826], \"label\": \"Text\"}, {\"bbox_2d\": [659, 305, 895, 371], \"label\": \"Text\"}, {\"bbox_2d\": [659, 374, 895, 423], \"label\": \"Text\"}, {\"bbox_2d\": [659, 426, 895, 475], \"label\": \"Text\"}, {\"bbox_2d\": [659, 478, 895, 505], \"label\": \"Text\"}, {\"bbox_2d\": [659, 879, 895, 956], \"label\": \"Text\"}, {\"bbox_2d\": [265, 861, 342, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [475, 855, 499, 864], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [786, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [506, 93, 520, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 334, 503, 735], \"label\": \"Text\"}, {\"bbox_2d\": [131, 745, 501, 781], \"label\": \"Text\"}, {\"bbox_2d\": [130, 790, 499, 827], \"label\": \"Text\"}, {\"bbox_2d\": [526, 879, 896, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 303, 895, 370], \"label\": \"Text\"}, {\"bbox_2d\": [526, 374, 894, 397], \"label\": \"Text\"}, {\"bbox_2d\": [526, 400, 898, 436], \"label\": \"Text\"}, {\"bbox_2d\": [526, 440, 898, 476], \"label\": \"Text\"}, {\"bbox_2d\": [659, 154, 895, 288], \"label\": \"Picture\"}, {\"bbox_2d\": [203, 157, 501, 302], \"label\": \"Picture\"}, {\"bbox_2d\": [656, 504, 896, 605], \"label\": \"Picture\"}, {\"bbox_2d\": [612, 606, 895, 745], \"label\": \"Picture\"}, {\"bbox_2d\": [630, 747, 896, 858], \"label\": \"Picture\"}, {\"bbox_2d\": [265, 855, 502, 971], \"label\": \"Picture\"}, {\"bbox_2d\": [677, 144, 746, 154], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [98, 36, 303, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [147, 81, 906, 914], \"label\": \"Picture\"}, {\"bbox_2d\": [372, 918, 678, 940], \"label\": \"Caption\"}, {\"bbox_2d\": [504, 983, 532, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [99, 35, 322, 67], \"label\": \"Page-header\"}, {\"bbox_2d\": [142, 79, 908, 912], \"label\": \"Picture\"}, {\"bbox_2d\": [373, 916, 679, 935], \"label\": \"Caption\"}, {\"bbox_2d\": [505, 979, 537, 994], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [99, 974, 376, 990], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 978, 377, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [38, 16, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [113, 99, 337, 111], \"label\": \"Text\"}, {\"bbox_2d\": [113, 717, 444, 728], \"label\": \"Text\"}, {\"bbox_2d\": [131, 730, 901, 759], \"label\": \"Text\"}, {\"bbox_2d\": [109, 122, 914, 524], \"label\": \"Table\"}, {\"bbox_2d\": [109, 550, 916, 694], \"label\": \"Table\"}, {\"bbox_2d\": [45, 998, 170, 1016], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [114, 116, 260, 127], \"label\": \"Text\"}, {\"bbox_2d\": [109, 128, 915, 322], \"label\": \"Table\"}, {\"bbox_2d\": [109, 330, 837, 526], \"label\": \"Table\"}, {\"bbox_2d\": [111, 561, 915, 622], \"label\": \"Table\"}, {\"bbox_2d\": [113, 550, 255, 561], \"label\": \"Text\"}, {\"bbox_2d\": [110, 631, 837, 692], \"label\": \"Table\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [131, 732, 906, 758], \"label\": \"Text\"}, {\"bbox_2d\": [114, 101, 337, 112], \"label\": \"List-item\"}, {\"bbox_2d\": [113, 717, 448, 728], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 107, 911, 150], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 562, 911, 631], \"label\": \"Caption\"}, {\"bbox_2d\": [351, 941, 676, 957], \"label\": \"Caption\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [325, 671, 715, 890], \"label\": \"Picture\"}, {\"bbox_2d\": [325, 290, 720, 510], \"label\": \"Picture\"}, {\"bbox_2d\": [120, 151, 913, 252], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [121, 107, 911, 149], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 560, 911, 631], \"label\": \"Caption\"}, {\"bbox_2d\": [350, 941, 682, 956], \"label\": \"Caption\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 153, 912, 250], \"label\": \"Table\"}, {\"bbox_2d\": [320, 291, 713, 511], \"label\": \"Picture\"}, {\"bbox_2d\": [325, 672, 712, 892], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [21, 432, 31, 603], \"label\": \"Page-header\"}, {\"bbox_2d\": [504, 994, 532, 1010], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [503, 992, 532, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [17, 431, 36, 604], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [292, 85, 717, 98], \"label\": \"Caption\"}, {\"bbox_2d\": [644, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [290, 96, 934, 924], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [293, 86, 718, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [644, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [288, 98, 940, 931], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 739, 926, 783], \"label\": \"Text\"}, {\"bbox_2d\": [231, 703, 575, 716], \"label\": \"Caption\"}, {\"bbox_2d\": [759, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [229, 103, 926, 700], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [227, 100, 929, 701], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 706, 570, 716], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 740, 922, 783], \"label\": \"Text\"}, {\"bbox_2d\": [758, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 98, 601, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 132, 669, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 164, 918, 272], \"label\": \"Text\"}, {\"bbox_2d\": [121, 308, 918, 430], \"label\": \"Text\"}, {\"bbox_2d\": [121, 467, 918, 543], \"label\": \"Text\"}, {\"bbox_2d\": [121, 564, 918, 591], \"label\": \"Text\"}, {\"bbox_2d\": [121, 628, 918, 782], \"label\": \"Text\"}, {\"bbox_2d\": [121, 613, 338, 622], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 293, 290, 304], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 452, 475, 463], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 941, 536, 953], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 605, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 129, 674, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 161, 921, 273], \"label\": \"Text\"}, {\"bbox_2d\": [121, 288, 294, 305], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 448, 478, 465], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 608, 343, 624], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 304, 920, 433], \"label\": \"Text\"}, {\"bbox_2d\": [121, 464, 922, 544], \"label\": \"Text\"}, {\"bbox_2d\": [121, 560, 917, 592], \"label\": \"Text\"}, {\"bbox_2d\": [121, 624, 922, 784], \"label\": \"Text\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [197, 97, 911, 208], \"label\": \"Picture\"}, {\"bbox_2d\": [188, 140, 911, 316], \"label\": \"Formula\"}, {\"bbox_2d\": [380, 336, 911, 383], \"label\": \"Formula\"}, {\"bbox_2d\": [445, 398, 911, 451], \"label\": \"Formula\"}, {\"bbox_2d\": [164, 477, 911, 583], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 622, 911, 681], \"label\": \"Formula\"}, {\"bbox_2d\": [146, 864, 496, 885], \"label\": \"Text\"}, {\"bbox_2d\": [144, 896, 911, 973], \"label\": \"Picture\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 842, 781, 858], \"label\": \"Text\"}, {\"bbox_2d\": [121, 694, 185, 710], \"label\": \"Text\"}, {\"bbox_2d\": [121, 590, 370, 605], \"label\": \"Text\"}, {\"bbox_2d\": [121, 454, 719, 497], \"label\": \"Text\"}, {\"bbox_2d\": [121, 380, 600, 399], \"label\": \"Text\"}, {\"bbox_2d\": [121, 312, 670, 328], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [187, 97, 910, 308], \"label\": \"Formula\"}, {\"bbox_2d\": [386, 335, 910, 369], \"label\": \"Formula\"}, {\"bbox_2d\": [440, 409, 910, 443], \"label\": \"Formula\"}, {\"bbox_2d\": [162, 506, 910, 591], \"label\": \"Formula\"}, {\"bbox_2d\": [207, 625, 910, 673], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 728, 952, 840], \"label\": \"Formula\"}, {\"bbox_2d\": [184, 901, 910, 962], \"label\": \"Formula\"}, {\"bbox_2d\": [498, 989, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 308, 780, 322], \"label\": \"Text\"}, {\"bbox_2d\": [121, 382, 595, 396], \"label\": \"Text\"}, {\"bbox_2d\": [121, 456, 911, 496], \"label\": \"Text\"}, {\"bbox_2d\": [121, 591, 427, 605], \"label\": \"Text\"}, {\"bbox_2d\": [121, 695, 182, 709], \"label\": \"Text\"}, {\"bbox_2d\": [121, 840, 668, 854], \"label\": \"Text\"}, {\"bbox_2d\": [147, 866, 578, 880], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [45, 35, 176, 46], \"label\": \"Page-header\"}, {\"bbox_2d\": [95, 123, 601, 180], \"label\": \"Section-header\"}, {\"bbox_2d\": [67, 239, 264, 433], \"label\": \"Picture\"}, {\"bbox_2d\": [67, 330, 222, 497], \"label\": \"Caption\"}, {\"bbox_2d\": [95, 552, 721, 583], \"label\": \"Section-header\"}, {\"bbox_2d\": [754, 552, 951, 744], \"label\": \"Picture\"}, {\"bbox_2d\": [95, 590, 851, 706], \"label\": \"Text\"}, {\"bbox_2d\": [95, 712, 926, 844], \"label\": \"Text\"}, {\"bbox_2d\": [95, 850, 926, 966], \"label\": \"Text\"}, {\"bbox_2d\": [754, 755, 884, 795], \"label\": \"Caption\"}, {\"bbox_2d\": [285, 276, 926, 531], \"label\": \"Text\"}, {\"bbox_2d\": [285, 239, 896, 270], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [47, 35, 177, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [94, 118, 604, 185], \"label\": \"Section-header\"}, {\"bbox_2d\": [284, 235, 898, 270], \"label\": \"Section-header\"}, {\"bbox_2d\": [69, 438, 216, 494], \"label\": \"Caption\"}, {\"bbox_2d\": [94, 548, 722, 583], \"label\": \"Section-header\"}, {\"bbox_2d\": [284, 273, 929, 513], \"label\": \"Text\"}, {\"bbox_2d\": [94, 517, 926, 530], \"label\": \"Text\"}, {\"bbox_2d\": [94, 586, 738, 704], \"label\": \"Text\"}, {\"bbox_2d\": [94, 847, 929, 965], \"label\": \"Text\"}, {\"bbox_2d\": [754, 751, 906, 794], \"label\": \"Caption\"}, {\"bbox_2d\": [94, 708, 735, 809], \"label\": \"Text\"}, {\"bbox_2d\": [94, 813, 925, 844], \"label\": \"Text\"}, {\"bbox_2d\": [69, 239, 266, 433], \"label\": \"Picture\"}, {\"bbox_2d\": [754, 550, 952, 744], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 92, 905, 136], \"label\": \"Text\"}, {\"bbox_2d\": [231, 384, 874, 414], \"label\": \"Text\"}, {\"bbox_2d\": [231, 439, 727, 452], \"label\": \"Text\"}, {\"bbox_2d\": [231, 569, 914, 628], \"label\": \"Text\"}, {\"bbox_2d\": [231, 549, 335, 562], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 337, 527, 349], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 462, 912, 488], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 500, 912, 525], \"label\": \"List-item\"}, {\"bbox_2d\": [635, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 155, 875, 334], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 910, 136], \"label\": \"Text\"}, {\"bbox_2d\": [241, 385, 878, 413], \"label\": \"Text\"}, {\"bbox_2d\": [231, 439, 728, 451], \"label\": \"Text\"}, {\"bbox_2d\": [231, 568, 917, 628], \"label\": \"Text\"}, {\"bbox_2d\": [635, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 549, 335, 563], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 461, 915, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 499, 915, 527], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 154, 874, 332], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 337, 526, 348], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 74, 945, 153], \"label\": \"Text\"}, {\"bbox_2d\": [91, 255, 945, 521], \"label\": \"Text\"}, {\"bbox_2d\": [91, 535, 945, 926], \"label\": \"Text\"}, {\"bbox_2d\": [121, 943, 945, 958], \"label\": \"Text\"}, {\"bbox_2d\": [508, 990, 527, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 204, 290, 224], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 75, 944, 151], \"label\": \"Text\"}, {\"bbox_2d\": [91, 204, 290, 224], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 255, 944, 519], \"label\": \"Text\"}, {\"bbox_2d\": [91, 537, 944, 926], \"label\": \"Text\"}, {\"bbox_2d\": [121, 943, 944, 957], \"label\": \"Text\"}, {\"bbox_2d\": [508, 993, 527, 1007], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [354, 68, 683, 89], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 954, 259, 995], \"label\": \"Page-footer\"}, {\"bbox_2d\": [883, 954, 934, 972], \"label\": \"Page-footer\"}, {\"bbox_2d\": [77, 131, 971, 906], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [353, 68, 686, 90], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 132, 971, 202], \"label\": \"Picture\"}, {\"bbox_2d\": [77, 238, 972, 751], \"label\": \"Picture\"}, {\"bbox_2d\": [78, 768, 598, 905], \"label\": \"Picture\"}, {\"bbox_2d\": [883, 955, 934, 972], \"label\": \"Page-footer\"}, {\"bbox_2d\": [88, 957, 261, 995], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [716, 64, 916, 96], \"label\": \"Page-header\"}, {\"bbox_2d\": [299, 952, 676, 968], \"label\": \"Caption\"}, {\"bbox_2d\": [119, 99, 845, 941], \"label\": \"Picture\"}, {\"bbox_2d\": [477, 988, 513, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [711, 71, 930, 101], \"label\": \"Page-header\"}, {\"bbox_2d\": [113, 102, 845, 944], \"label\": \"Picture\"}, {\"bbox_2d\": [302, 954, 681, 970], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [87, 68, 293, 97], \"label\": \"Page-header\"}, {\"bbox_2d\": [169, 964, 773, 983], \"label\": \"Caption\"}, {\"bbox_2d\": [167, 113, 788, 957], \"label\": \"Picture\"}, {\"bbox_2d\": [513, 994, 541, 1006], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [88, 69, 312, 100], \"label\": \"Page-header\"}, {\"bbox_2d\": [515, 993, 547, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [280, 968, 773, 982], \"label\": \"Caption\"}, {\"bbox_2d\": [155, 114, 904, 955], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 123, 893, 165], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 186, 919, 215], \"label\": \"Text\"}, {\"bbox_2d\": [293, 234, 610, 246], \"label\": \"Text\"}, {\"bbox_2d\": [293, 324, 928, 336], \"label\": \"Text\"}, {\"bbox_2d\": [293, 355, 919, 383], \"label\": \"Text\"}, {\"bbox_2d\": [121, 406, 551, 423], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 436, 430, 448], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 455, 594, 467], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 474, 469, 486], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 506, 845, 518], \"label\": \"Text\"}, {\"bbox_2d\": [293, 860, 925, 888], \"label\": \"Text\"}, {\"bbox_2d\": [293, 899, 554, 945], \"label\": \"Table\"}, {\"bbox_2d\": [317, 530, 783, 823], \"label\": \"Text\"}, {\"bbox_2d\": [905, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 301, 506, 305], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 274, 468, 286], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 253, 674, 265], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 893, 164], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 406, 552, 423], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 435, 429, 448], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 505, 844, 518], \"label\": \"Text\"}, {\"bbox_2d\": [317, 529, 783, 823], \"label\": \"Text\"}, {\"bbox_2d\": [293, 860, 925, 889], \"label\": \"Text\"}, {\"bbox_2d\": [904, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 253, 673, 266], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 273, 467, 286], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 292, 506, 306], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 355, 919, 384], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 452, 593, 467], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 474, 469, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 187, 919, 216], \"label\": \"Text\"}, {\"bbox_2d\": [293, 234, 610, 247], \"label\": \"Text\"}, {\"bbox_2d\": [293, 324, 928, 337], \"label\": \"Text\"}, {\"bbox_2d\": [293, 903, 555, 947], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [101, 84, 198, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [275, 118, 578, 132], \"label\": \"Caption\"}, {\"bbox_2d\": [272, 132, 912, 900], \"label\": \"Table\"}, {\"bbox_2d\": [101, 975, 371, 992], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 89, 197, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [272, 120, 580, 131], \"label\": \"Caption\"}, {\"bbox_2d\": [101, 978, 370, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [271, 132, 912, 899], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [105, 65, 925, 960], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 974, 760, 990], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 1001, 509, 1017], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [718, 65, 942, 97], \"label\": \"Page-header\"}, {\"bbox_2d\": [104, 104, 907, 962], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 973, 762, 987], \"label\": \"Caption\"}, {\"bbox_2d\": [497, 995, 529, 1010], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [160, 891, 732, 899], \"label\": \"Text\"}, {\"bbox_2d\": [526, 988, 534, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [160, 139, 869, 176], \"label\": \"Text\"}, {\"bbox_2d\": [160, 186, 573, 199], \"label\": \"Text\"}, {\"bbox_2d\": [160, 84, 681, 118], \"label\": \"Section-header\"}, {\"bbox_2d\": [464, 54, 596, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [782, 103, 908, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [161, 212, 907, 476], \"label\": \"Table\"}, {\"bbox_2d\": [162, 834, 168, 868], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [464, 53, 596, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [213, 89, 681, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [786, 103, 906, 113], \"label\": \"Text\"}, {\"bbox_2d\": [167, 141, 868, 160], \"label\": \"Text\"}, {\"bbox_2d\": [167, 161, 778, 179], \"label\": \"Text\"}, {\"bbox_2d\": [814, 184, 906, 194], \"label\": \"Text\"}, {\"bbox_2d\": [82, 142, 89, 154], \"label\": \"Text\"}, {\"bbox_2d\": [79, 215, 92, 226], \"label\": \"Text\"}, {\"bbox_2d\": [79, 287, 92, 299], \"label\": \"Text\"}, {\"bbox_2d\": [79, 360, 92, 371], \"label\": \"Text\"}, {\"bbox_2d\": [79, 433, 92, 444], \"label\": \"Text\"}, {\"bbox_2d\": [79, 505, 92, 517], \"label\": \"Text\"}, {\"bbox_2d\": [79, 578, 92, 589], \"label\": \"Text\"}, {\"bbox_2d\": [79, 650, 92, 662], \"label\": \"Text\"}, {\"bbox_2d\": [79, 723, 92, 734], \"label\": \"Text\"}, {\"bbox_2d\": [79, 796, 92, 807], \"label\": \"Text\"}, {\"bbox_2d\": [79, 868, 92, 880], \"label\": \"Text\"}, {\"bbox_2d\": [153, 810, 160, 865], \"label\": \"Text\"}, {\"bbox_2d\": [167, 891, 731, 899], \"label\": \"Text\"}, {\"bbox_2d\": [525, 987, 534, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [162, 211, 910, 883], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 192], \"label\": \"Text\"}, {\"bbox_2d\": [121, 282, 911, 543], \"label\": \"Text\"}, {\"bbox_2d\": [121, 612, 911, 709], \"label\": \"Text\"}, {\"bbox_2d\": [121, 719, 548, 738], \"label\": \"Text\"}, {\"bbox_2d\": [121, 809, 172, 820], \"label\": \"Text\"}, {\"bbox_2d\": [121, 867, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [315, 753, 911, 786], \"label\": \"Formula\"}, {\"bbox_2d\": [462, 823, 911, 855], \"label\": \"Formula\"}, {\"bbox_2d\": [287, 561, 911, 608], \"label\": \"Formula\"}, {\"bbox_2d\": [147, 236, 421, 250], \"label\": \"Section-header\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 193], \"label\": \"Text\"}, {\"bbox_2d\": [121, 281, 911, 543], \"label\": \"Text\"}, {\"bbox_2d\": [121, 611, 911, 709], \"label\": \"Text\"}, {\"bbox_2d\": [147, 721, 547, 736], \"label\": \"Text\"}, {\"bbox_2d\": [121, 804, 172, 820], \"label\": \"Text\"}, {\"bbox_2d\": [121, 867, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [423, 752, 911, 800], \"label\": \"Formula\"}, {\"bbox_2d\": [462, 821, 911, 859], \"label\": \"Formula\"}, {\"bbox_2d\": [287, 559, 911, 601], \"label\": \"Formula\"}, {\"bbox_2d\": [147, 237, 421, 250], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [218, 83, 392, 103], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 107, 926, 137], \"label\": \"Text\"}, {\"bbox_2d\": [451, 152, 982, 207], \"label\": \"Text\"}, {\"bbox_2d\": [451, 208, 953, 250], \"label\": \"Text\"}, {\"bbox_2d\": [451, 252, 978, 321], \"label\": \"Text\"}, {\"bbox_2d\": [451, 323, 974, 368], \"label\": \"Text\"}, {\"bbox_2d\": [451, 500, 964, 556], \"label\": \"Text\"}, {\"bbox_2d\": [451, 558, 972, 642], \"label\": \"Text\"}, {\"bbox_2d\": [294, 558, 350, 570], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 498, 393, 510], \"label\": \"Section-header\"}, {\"bbox_2d\": [451, 644, 957, 669], \"label\": \"Text\"}, {\"bbox_2d\": [294, 703, 900, 733], \"label\": \"Text\"}, {\"bbox_2d\": [616, 979, 933, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [451, 485, 740, 497], \"label\": \"List-item\"}, {\"bbox_2d\": [451, 469, 899, 481], \"label\": \"List-item\"}, {\"bbox_2d\": [451, 449, 507, 461], \"label\": \"List-item\"}, {\"bbox_2d\": [451, 432, 496, 444], \"label\": \"List-item\"}, {\"bbox_2d\": [451, 410, 582, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [451, 373, 563, 385], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 325, 360, 338], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 251, 323, 263], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 208, 326, 220], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 152, 323, 164], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [293, 148, 982, 671], \"label\": \"Table\"}, {\"bbox_2d\": [217, 85, 391, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 108, 926, 137], \"label\": \"Text\"}, {\"bbox_2d\": [293, 703, 900, 732], \"label\": \"Text\"}, {\"bbox_2d\": [616, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 109, 888, 197], \"label\": \"Text\"}, {\"bbox_2d\": [147, 238, 887, 402], \"label\": \"Text\"}, {\"bbox_2d\": [147, 439, 888, 607], \"label\": \"Text\"}, {\"bbox_2d\": [147, 643, 888, 758], \"label\": \"Text\"}, {\"bbox_2d\": [147, 796, 888, 911], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [148, 108, 887, 198], \"label\": \"Text\"}, {\"bbox_2d\": [148, 237, 886, 401], \"label\": \"Text\"}, {\"bbox_2d\": [148, 439, 888, 605], \"label\": \"Text\"}, {\"bbox_2d\": [148, 643, 888, 758], \"label\": \"Text\"}, {\"bbox_2d\": [148, 796, 887, 911], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [172, 96, 581, 109], \"label\": \"Text\"}, {\"bbox_2d\": [131, 125, 910, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 183, 910, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 238, 910, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 294, 910, 336], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 351, 910, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 408, 910, 449], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 464, 910, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 522, 910, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 579, 861, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 607, 910, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 663, 910, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 719, 910, 762], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 777, 910, 846], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 862, 910, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 920, 910, 962], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [172, 97, 581, 110], \"label\": \"Text\"}, {\"bbox_2d\": [131, 125, 911, 166], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 182, 911, 223], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 239, 911, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 295, 911, 337], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 352, 911, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 409, 911, 450], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 466, 911, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 522, 911, 563], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 579, 862, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 607, 911, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 664, 911, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 721, 911, 762], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 777, 911, 847], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 862, 911, 903], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 919, 911, 960], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [81, 111, 500, 485], \"label\": \"Text\"}, {\"bbox_2d\": [81, 563, 494, 637], \"label\": \"Text\"}, {\"bbox_2d\": [81, 649, 483, 723], \"label\": \"Text\"}, {\"bbox_2d\": [81, 735, 500, 824], \"label\": \"Text\"}, {\"bbox_2d\": [533, 683, 949, 805], \"label\": \"Text\"}, {\"bbox_2d\": [533, 515, 943, 604], \"label\": \"Text\"}, {\"bbox_2d\": [533, 228, 938, 288], \"label\": \"Text\"}, {\"bbox_2d\": [533, 111, 952, 217], \"label\": \"Text\"}, {\"bbox_2d\": [81, 513, 176, 531], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 347, 624, 363], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 314, 628, 335], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 545, 322, 559], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 636, 628, 652], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 667, 790, 680], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 1005, 247, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [533, 366, 783, 378], \"label\": \"Caption\"}, {\"bbox_2d\": [533, 388, 954, 495], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [533, 112, 953, 216], \"label\": \"Text\"}, {\"bbox_2d\": [81, 112, 502, 483], \"label\": \"Text\"}, {\"bbox_2d\": [533, 229, 946, 287], \"label\": \"Text\"}, {\"bbox_2d\": [81, 563, 494, 636], \"label\": \"Text\"}, {\"bbox_2d\": [81, 649, 487, 722], \"label\": \"Text\"}, {\"bbox_2d\": [81, 736, 501, 824], \"label\": \"Text\"}, {\"bbox_2d\": [533, 683, 953, 803], \"label\": \"Text\"}, {\"bbox_2d\": [533, 515, 943, 603], \"label\": \"Text\"}, {\"bbox_2d\": [533, 632, 628, 650], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 512, 176, 530], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 545, 321, 558], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 666, 790, 679], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 316, 628, 334], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 349, 624, 362], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 367, 785, 377], \"label\": \"Text\"}, {\"bbox_2d\": [530, 386, 958, 496], \"label\": \"Table\"}, {\"bbox_2d\": [81, 1005, 246, 1013], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [28, 403, 41, 631], \"label\": \"Page-header\"}, {\"bbox_2d\": [509, 994, 526, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [508, 990, 527, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 402, 43, 633], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [236, 68, 810, 83], \"label\": \"Title\"}, {\"bbox_2d\": [172, 102, 845, 131], \"label\": \"Text\"}, {\"bbox_2d\": [225, 137, 818, 157], \"label\": \"Text\"}, {\"bbox_2d\": [184, 172, 835, 197], \"label\": \"Text\"}, {\"bbox_2d\": [184, 212, 418, 224], \"label\": \"Text\"}, {\"bbox_2d\": [91, 252, 506, 329], \"label\": \"Text\"}, {\"bbox_2d\": [91, 334, 506, 617], \"label\": \"Text\"}, {\"bbox_2d\": [91, 622, 506, 757], \"label\": \"Text\"}, {\"bbox_2d\": [91, 762, 506, 868], \"label\": \"Text\"}, {\"bbox_2d\": [91, 913, 506, 928], \"label\": \"Footnote\"}, {\"bbox_2d\": [91, 928, 506, 949], \"label\": \"Footnote\"}, {\"bbox_2d\": [31, 231, 60, 739], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 252, 951, 308], \"label\": \"Text\"}, {\"bbox_2d\": [536, 308, 951, 357], \"label\": \"Text\"}, {\"bbox_2d\": [536, 395, 951, 424], \"label\": \"Text\"}, {\"bbox_2d\": [536, 466, 951, 526], \"label\": \"Text\"}, {\"bbox_2d\": [536, 578, 951, 607], \"label\": \"Text\"}, {\"bbox_2d\": [536, 701, 951, 745], \"label\": \"Text\"}, {\"bbox_2d\": [536, 846, 548, 857], \"label\": \"Text\"}, {\"bbox_2d\": [536, 876, 951, 905], \"label\": \"Formula\"}, {\"bbox_2d\": [695, 818, 951, 836], \"label\": \"Formula\"}, {\"bbox_2d\": [677, 753, 951, 790], \"label\": \"Formula\"}, {\"bbox_2d\": [607, 618, 951, 677], \"label\": \"Formula\"}, {\"bbox_2d\": [667, 540, 951, 575], \"label\": \"Formula\"}, {\"bbox_2d\": [595, 440, 951, 463], \"label\": \"Formula\"}, {\"bbox_2d\": [621, 368, 951, 384], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [236, 70, 805, 84], \"label\": \"Title\"}, {\"bbox_2d\": [291, 101, 751, 115], \"label\": \"Text\"}, {\"bbox_2d\": [173, 118, 869, 131], \"label\": \"Text\"}, {\"bbox_2d\": [224, 134, 818, 147], \"label\": \"Text\"}, {\"bbox_2d\": [415, 149, 627, 161], \"label\": \"Text\"}, {\"bbox_2d\": [184, 172, 858, 197], \"label\": \"Text\"}, {\"bbox_2d\": [184, 213, 418, 224], \"label\": \"Text\"}, {\"bbox_2d\": [91, 252, 506, 325], \"label\": \"Text\"}, {\"bbox_2d\": [91, 333, 506, 615], \"label\": \"Text\"}, {\"bbox_2d\": [91, 623, 506, 755], \"label\": \"Text\"}, {\"bbox_2d\": [91, 763, 506, 866], \"label\": \"Text\"}, {\"bbox_2d\": [536, 252, 951, 310], \"label\": \"Text\"}, {\"bbox_2d\": [536, 312, 951, 355], \"label\": \"Text\"}, {\"bbox_2d\": [536, 397, 952, 425], \"label\": \"Text\"}, {\"bbox_2d\": [536, 466, 951, 524], \"label\": \"Text\"}, {\"bbox_2d\": [536, 580, 951, 608], \"label\": \"Text\"}, {\"bbox_2d\": [536, 701, 951, 744], \"label\": \"Text\"}, {\"bbox_2d\": [536, 846, 633, 859], \"label\": \"Text\"}, {\"bbox_2d\": [647, 369, 951, 382], \"label\": \"Formula\"}, {\"bbox_2d\": [600, 439, 951, 452], \"label\": \"Formula\"}, {\"bbox_2d\": [666, 537, 951, 568], \"label\": \"Formula\"}, {\"bbox_2d\": [636, 620, 951, 651], \"label\": \"Formula\"}, {\"bbox_2d\": [636, 657, 951, 688], \"label\": \"Formula\"}, {\"bbox_2d\": [673, 754, 951, 786], \"label\": \"Formula\"}, {\"bbox_2d\": [696, 815, 951, 832], \"label\": \"Formula\"}, {\"bbox_2d\": [682, 871, 951, 902], \"label\": \"Formula\"}, {\"bbox_2d\": [677, 923, 951, 954], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 913, 505, 923], \"label\": \"Footnote\"}, {\"bbox_2d\": [91, 926, 383, 947], \"label\": \"Footnote\"}, {\"bbox_2d\": [31, 230, 61, 736], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [307, 386, 978, 635], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [8, 377, 984, 642], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 450, 105], \"label\": \"Text\"}, {\"bbox_2d\": [231, 183, 764, 197], \"label\": \"Text\"}, {\"bbox_2d\": [231, 208, 904, 234], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 116, 477, 127], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 131, 527, 143], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 257, 669, 567], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 573, 485, 584], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 607, 464, 619], \"label\": \"Text\"}, {\"bbox_2d\": [255, 629, 369, 641], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 645, 580, 657], \"label\": \"List-item\"}, {\"bbox_2d\": [289, 675, 761, 687], \"label\": \"Text\"}, {\"bbox_2d\": [231, 713, 926, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [720, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 165, 617, 179], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 450, 105], \"label\": \"Text\"}, {\"bbox_2d\": [231, 185, 764, 197], \"label\": \"Text\"}, {\"bbox_2d\": [231, 131, 527, 143], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 115, 477, 127], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 207, 908, 235], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 165, 617, 179], \"label\": \"Section-header\"}, {\"bbox_2d\": [257, 255, 666, 568], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 573, 484, 584], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 607, 465, 619], \"label\": \"Text\"}, {\"bbox_2d\": [290, 675, 761, 687], \"label\": \"Text\"}, {\"bbox_2d\": [257, 645, 591, 657], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 629, 379, 641], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 713, 926, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [721, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [680, 40, 937, 60], \"label\": \"Page-header\"}, {\"bbox_2d\": [223, 102, 813, 130], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 165, 401, 183], \"label\": \"Text\"}, {\"bbox_2d\": [98, 219, 941, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 309, 937, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 489, 941, 544], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 579, 869, 597], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 616, 941, 669], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 802, 108, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 380, 880, 418], \"label\": \"Text\"}, {\"bbox_2d\": [223, 433, 838, 454], \"label\": \"Text\"}, {\"bbox_2d\": [223, 686, 822, 724], \"label\": \"Text\"}, {\"bbox_2d\": [223, 740, 838, 760], \"label\": \"Text\"}, {\"bbox_2d\": [98, 975, 208, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [914, 955, 937, 972], \"label\": \"Page-footer\"}, {\"bbox_2d\": [898, 975, 937, 1001], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [223, 100, 819, 131], \"label\": \"Section-header\"}, {\"bbox_2d\": [679, 40, 942, 60], \"label\": \"Page-header\"}, {\"bbox_2d\": [98, 164, 404, 183], \"label\": \"Text\"}, {\"bbox_2d\": [98, 218, 948, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 308, 929, 364], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 488, 943, 544], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 614, 943, 670], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 578, 870, 598], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 686, 818, 725], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 740, 841, 761], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 802, 108, 820], \"label\": \"Text\"}, {\"bbox_2d\": [899, 955, 945, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 974, 213, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [223, 380, 883, 418], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 434, 836, 454], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [440, 74, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [698, 93, 723, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 326, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [477, 124, 546, 144], \"label\": \"Text\"}, {\"bbox_2d\": [126, 155, 506, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [515, 158, 890, 972], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [698, 93, 722, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 325, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [516, 900, 896, 970], \"label\": \"Picture\"}, {\"bbox_2d\": [516, 810, 895, 877], \"label\": \"Picture\"}, {\"bbox_2d\": [515, 702, 896, 808], \"label\": \"Picture\"}, {\"bbox_2d\": [516, 580, 894, 685], \"label\": \"Picture\"}, {\"bbox_2d\": [515, 455, 895, 577], \"label\": \"Picture\"}, {\"bbox_2d\": [515, 361, 895, 416], \"label\": \"Picture\"}, {\"bbox_2d\": [515, 134, 893, 238], \"label\": \"Picture\"}, {\"bbox_2d\": [479, 125, 548, 135], \"label\": \"Text\"}, {\"bbox_2d\": [868, 241, 896, 249], \"label\": \"Text\"}, {\"bbox_2d\": [129, 900, 506, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [126, 809, 509, 900], \"label\": \"Picture\"}, {\"bbox_2d\": [129, 702, 505, 760], \"label\": \"Picture\"}, {\"bbox_2d\": [132, 580, 506, 700], \"label\": \"Picture\"}, {\"bbox_2d\": [128, 455, 505, 577], \"label\": \"Picture\"}, {\"bbox_2d\": [129, 361, 507, 453], \"label\": \"Picture\"}, {\"bbox_2d\": [128, 135, 509, 190], \"label\": \"Picture\"}, {\"bbox_2d\": [127, 242, 504, 360], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [783, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 92, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [546, 141, 768, 965], \"label\": \"Picture\"}, {\"bbox_2d\": [156, 141, 423, 969], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [779, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [503, 93, 522, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [160, 139, 419, 970], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 139, 767, 963], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [742, 48, 903, 62], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 75, 866, 160], \"label\": \"Title\"}, {\"bbox_2d\": [425, 188, 584, 202], \"label\": \"Text\"}, {\"bbox_2d\": [326, 209, 681, 224], \"label\": \"Text\"}, {\"bbox_2d\": [157, 858, 872, 904], \"label\": \"Text\"}, {\"bbox_2d\": [159, 935, 812, 949], \"label\": \"Text\"}, {\"bbox_2d\": [166, 992, 302, 1005], \"label\": \"Page-footer\"}, {\"bbox_2d\": [692, 1008, 901, 1023], \"label\": \"Page-footer\"}, {\"bbox_2d\": [157, 238, 823, 835], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [743, 50, 908, 63], \"label\": \"Section-header\"}, {\"bbox_2d\": [158, 860, 804, 871], \"label\": \"Text\"}, {\"bbox_2d\": [157, 875, 873, 901], \"label\": \"Text\"}, {\"bbox_2d\": [158, 935, 816, 947], \"label\": \"Section-header\"}, {\"bbox_2d\": [693, 1008, 905, 1022], \"label\": \"Section-header\"}, {\"bbox_2d\": [159, 237, 826, 835], \"label\": \"Picture\"}, {\"bbox_2d\": [128, 78, 873, 163], \"label\": \"Title\"}, {\"bbox_2d\": [325, 166, 684, 224], \"label\": \"Text\"}, {\"bbox_2d\": [165, 987, 256, 1008], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [567, 57, 702, 71], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 196, 382, 210], \"label\": \"Section-header\"}, {\"bbox_2d\": [567, 459, 716, 473], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 462, 341, 476], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 632, 517, 679], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 681, 310, 694], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 58, 517, 179], \"label\": \"Text\"}, {\"bbox_2d\": [123, 215, 517, 351], \"label\": \"Text\"}, {\"bbox_2d\": [123, 353, 517, 444], \"label\": \"Text\"}, {\"bbox_2d\": [123, 479, 517, 601], \"label\": \"Text\"}, {\"bbox_2d\": [567, 476, 962, 583], \"label\": \"Text\"}, {\"bbox_2d\": [567, 585, 962, 630], \"label\": \"Text\"}, {\"bbox_2d\": [567, 241, 962, 443], \"label\": \"Text\"}, {\"bbox_2d\": [567, 758, 962, 896], \"label\": \"Text\"}, {\"bbox_2d\": [123, 699, 517, 805], \"label\": \"Text\"}, {\"bbox_2d\": [123, 807, 517, 883], \"label\": \"Text\"}, {\"bbox_2d\": [862, 999, 974, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [564, 637, 964, 745], \"label\": \"Picture\"}, {\"bbox_2d\": [566, 78, 964, 223], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [565, 55, 701, 77], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 193, 382, 216], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 460, 340, 482], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 456, 717, 479], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 636, 530, 679], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 679, 311, 702], \"label\": \"Section-header\"}, {\"bbox_2d\": [863, 1001, 977, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [565, 760, 968, 895], \"label\": \"Text\"}, {\"bbox_2d\": [120, 810, 522, 883], \"label\": \"Text\"}, {\"bbox_2d\": [120, 700, 522, 804], \"label\": \"Text\"}, {\"bbox_2d\": [565, 587, 967, 629], \"label\": \"Text\"}, {\"bbox_2d\": [565, 478, 968, 582], \"label\": \"Text\"}, {\"bbox_2d\": [120, 481, 522, 601], \"label\": \"Text\"}, {\"bbox_2d\": [120, 356, 529, 444], \"label\": \"Text\"}, {\"bbox_2d\": [565, 243, 748, 347], \"label\": \"Text\"}, {\"bbox_2d\": [565, 353, 968, 441], \"label\": \"Text\"}, {\"bbox_2d\": [120, 215, 522, 350], \"label\": \"Text\"}, {\"bbox_2d\": [120, 58, 522, 146], \"label\": \"Text\"}, {\"bbox_2d\": [120, 152, 522, 178], \"label\": \"Text\"}, {\"bbox_2d\": [564, 80, 963, 222], \"label\": \"Picture\"}, {\"bbox_2d\": [756, 235, 964, 341], \"label\": \"Picture\"}, {\"bbox_2d\": [565, 637, 759, 744], \"label\": \"Picture\"}, {\"bbox_2d\": [769, 637, 964, 744], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [77, 69, 367, 80], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 93, 451, 104], \"label\": \"Section-header\"}, {\"bbox_2d\": [75, 114, 926, 584], \"label\": \"Table\"}, {\"bbox_2d\": [77, 589, 403, 599], \"label\": \"Text\"}, {\"bbox_2d\": [77, 796, 364, 942], \"label\": \"Picture\"}, {\"bbox_2d\": [381, 800, 567, 811], \"label\": \"Section-header\"}, {\"bbox_2d\": [381, 827, 667, 918], \"label\": \"Text\"}, {\"bbox_2d\": [966, 957, 981, 968], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [72, 794, 365, 943], \"label\": \"Picture\"}, {\"bbox_2d\": [381, 801, 567, 812], \"label\": \"Caption\"}, {\"bbox_2d\": [381, 827, 778, 916], \"label\": \"Text\"}, {\"bbox_2d\": [966, 982, 981, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [76, 589, 398, 599], \"label\": \"Text\"}, {\"bbox_2d\": [76, 70, 366, 80], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 94, 561, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [75, 112, 922, 581], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 473, 911, 905], \"label\": \"Text\"}, {\"bbox_2d\": [121, 918, 911, 961], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [274, 436, 728, 453], \"label\": \"Caption\"}, {\"bbox_2d\": [119, 110, 873, 406], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [118, 110, 894, 392], \"label\": \"Picture\"}, {\"bbox_2d\": [273, 435, 758, 449], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 479, 911, 906], \"label\": \"Text\"}, {\"bbox_2d\": [121, 917, 910, 960], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 406, 921, 449], \"label\": \"Text\"}, {\"bbox_2d\": [231, 466, 920, 495], \"label\": \"Text\"}, {\"bbox_2d\": [231, 511, 926, 538], \"label\": \"Text\"}, {\"bbox_2d\": [231, 554, 908, 582], \"label\": \"Text\"}, {\"bbox_2d\": [231, 691, 913, 735], \"label\": \"Text\"}, {\"bbox_2d\": [231, 852, 895, 911], \"label\": \"Text\"}, {\"bbox_2d\": [231, 831, 398, 847], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 600, 575, 611], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 622, 908, 666], \"label\": \"Text\"}, {\"bbox_2d\": [231, 751, 543, 762], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 774, 696, 801], \"label\": \"Text\"}, {\"bbox_2d\": [231, 372, 488, 383], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 867, 901, 911], \"label\": \"Text\"}, {\"bbox_2d\": [231, 326, 688, 369], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 103, 690, 368], \"label\": \"Picture\"}, {\"bbox_2d\": [109, 987, 693, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [232, 100, 691, 369], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 372, 486, 383], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 406, 922, 450], \"label\": \"Text\"}, {\"bbox_2d\": [231, 467, 920, 494], \"label\": \"Text\"}, {\"bbox_2d\": [231, 511, 926, 539], \"label\": \"Text\"}, {\"bbox_2d\": [231, 555, 911, 583], \"label\": \"Text\"}, {\"bbox_2d\": [231, 692, 914, 735], \"label\": \"Text\"}, {\"bbox_2d\": [231, 852, 910, 912], \"label\": \"Text\"}, {\"bbox_2d\": [231, 833, 398, 847], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 752, 540, 763], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 600, 574, 611], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 622, 916, 665], \"label\": \"Text\"}, {\"bbox_2d\": [231, 774, 696, 801], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [275, 83, 889, 129], \"label\": \"Text\"}, {\"bbox_2d\": [275, 159, 886, 218], \"label\": \"Text\"}, {\"bbox_2d\": [275, 226, 563, 236], \"label\": \"Text\"}, {\"bbox_2d\": [275, 511, 890, 539], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [278, 362, 585, 372], \"label\": \"Caption\"}, {\"bbox_2d\": [278, 282, 395, 293], \"label\": \"Caption\"}, {\"bbox_2d\": [278, 360, 585, 370], \"label\": \"Caption\"}, {\"bbox_2d\": [277, 249, 912, 347], \"label\": \"Picture\"}, {\"bbox_2d\": [274, 385, 914, 483], \"label\": \"Picture\"}, {\"bbox_2d\": [277, 143, 629, 157], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 889, 129], \"label\": \"Text\"}, {\"bbox_2d\": [273, 159, 886, 220], \"label\": \"Text\"}, {\"bbox_2d\": [277, 283, 394, 295], \"label\": \"Text\"}, {\"bbox_2d\": [277, 435, 394, 447], \"label\": \"Text\"}, {\"bbox_2d\": [273, 512, 890, 541], \"label\": \"Text\"}, {\"bbox_2d\": [273, 144, 629, 158], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 224, 562, 237], \"label\": \"Section-header\"}, {\"bbox_2d\": [281, 361, 585, 374], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [14, 472, 20, 525], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 1001, 223, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [26, 243, 1035, 885], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [12, 470, 22, 532], \"label\": \"Page-header\"}, {\"bbox_2d\": [51, 1002, 223, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [28, 0, 1034, 987], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [761, 73, 965, 101], \"label\": \"Page-header\"}, {\"bbox_2d\": [554, 118, 965, 224], \"label\": \"Text\"}, {\"bbox_2d\": [554, 229, 965, 637], \"label\": \"Text\"}, {\"bbox_2d\": [559, 642, 965, 843], \"label\": \"Text\"}, {\"bbox_2d\": [559, 851, 971, 983], \"label\": \"Text\"}, {\"bbox_2d\": [103, 545, 517, 689], \"label\": \"Text\"}, {\"bbox_2d\": [107, 693, 519, 985], \"label\": \"Text\"}, {\"bbox_2d\": [103, 280, 513, 500], \"label\": \"Text\"}, {\"bbox_2d\": [104, 517, 425, 530], \"label\": \"Section-header\"}, {\"bbox_2d\": [100, 118, 512, 264], \"label\": \"Formula\"}, {\"bbox_2d\": [527, 998, 554, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [759, 71, 982, 102], \"label\": \"Page-header\"}, {\"bbox_2d\": [552, 114, 979, 225], \"label\": \"Text\"}, {\"bbox_2d\": [100, 115, 517, 151], \"label\": \"Text\"}, {\"bbox_2d\": [267, 181, 524, 218], \"label\": \"Text\"}, {\"bbox_2d\": [101, 163, 339, 181], \"label\": \"Text\"}, {\"bbox_2d\": [244, 230, 371, 266], \"label\": \"Text\"}, {\"bbox_2d\": [554, 226, 976, 636], \"label\": \"Text\"}, {\"bbox_2d\": [101, 277, 523, 499], \"label\": \"Text\"}, {\"bbox_2d\": [104, 540, 527, 689], \"label\": \"Text\"}, {\"bbox_2d\": [558, 638, 983, 846], \"label\": \"Text\"}, {\"bbox_2d\": [104, 514, 425, 531], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 689, 534, 987], \"label\": \"Text\"}, {\"bbox_2d\": [560, 848, 988, 985], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [679, 14, 997, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [160, 99, 963, 670], \"label\": \"Table\"}, {\"bbox_2d\": [160, 675, 963, 710], \"label\": \"List-item\"}, {\"bbox_2d\": [160, 711, 963, 733], \"label\": \"List-item\"}, {\"bbox_2d\": [160, 766, 522, 943], \"label\": \"Text\"}, {\"bbox_2d\": [180, 947, 522, 972], \"label\": \"Text\"}, {\"bbox_2d\": [587, 766, 963, 793], \"label\": \"Text\"}, {\"bbox_2d\": [587, 795, 963, 958], \"label\": \"Text\"}, {\"bbox_2d\": [587, 960, 935, 972], \"label\": \"Text\"}, {\"bbox_2d\": [864, 999, 990, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [160, 767, 460, 793], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [681, 20, 905, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [953, 18, 997, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [161, 677, 960, 709], \"label\": \"Text\"}, {\"bbox_2d\": [161, 712, 963, 733], \"label\": \"Text\"}, {\"bbox_2d\": [180, 812, 500, 823], \"label\": \"Text\"}, {\"bbox_2d\": [180, 827, 527, 943], \"label\": \"Text\"}, {\"bbox_2d\": [180, 948, 528, 973], \"label\": \"Text\"}, {\"bbox_2d\": [614, 767, 961, 793], \"label\": \"Text\"}, {\"bbox_2d\": [614, 797, 964, 958], \"label\": \"Text\"}, {\"bbox_2d\": [628, 963, 941, 973], \"label\": \"Text\"}, {\"bbox_2d\": [863, 1001, 992, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [155, 101, 964, 672], \"label\": \"Table\"}, {\"bbox_2d\": [161, 767, 465, 793], \"label\": \"Section-header\"}, {\"bbox_2d\": [180, 797, 217, 808], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [149, 108, 886, 148], \"label\": \"Text\"}, {\"bbox_2d\": [147, 185, 888, 376], \"label\": \"Text\"}, {\"bbox_2d\": [148, 415, 888, 733], \"label\": \"Text\"}, {\"bbox_2d\": [147, 770, 887, 911], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [148, 108, 888, 147], \"label\": \"Text\"}, {\"bbox_2d\": [148, 185, 889, 376], \"label\": \"Text\"}, {\"bbox_2d\": [148, 415, 889, 732], \"label\": \"Text\"}, {\"bbox_2d\": [148, 770, 888, 911], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [108, 32, 293, 123], \"label\": \"Picture\"}, {\"bbox_2d\": [460, 73, 623, 85], \"label\": \"Section-header\"}, {\"bbox_2d\": [180, 134, 392, 173], \"label\": \"Text\"}, {\"bbox_2d\": [292, 174, 421, 186], \"label\": \"Text\"}, {\"bbox_2d\": [292, 206, 462, 218], \"label\": \"Text\"}, {\"bbox_2d\": [292, 238, 712, 249], \"label\": \"Text\"}, {\"bbox_2d\": [292, 270, 798, 281], \"label\": \"Text\"}, {\"bbox_2d\": [179, 365, 894, 376], \"label\": \"Text\"}, {\"bbox_2d\": [179, 428, 759, 440], \"label\": \"Text\"}, {\"bbox_2d\": [179, 491, 902, 544], \"label\": \"Text\"}, {\"bbox_2d\": [179, 564, 902, 604], \"label\": \"Text\"}, {\"bbox_2d\": [179, 696, 501, 698], \"label\": \"Text\"}, {\"bbox_2d\": [179, 624, 782, 635], \"label\": \"Text\"}, {\"bbox_2d\": [179, 686, 501, 698], \"label\": \"Text\"}, {\"bbox_2d\": [179, 810, 902, 836], \"label\": \"Text\"}, {\"bbox_2d\": [209, 856, 902, 895], \"label\": \"List-item\"}, {\"bbox_2d\": [209, 896, 902, 922], \"label\": \"List-item\"}, {\"bbox_2d\": [209, 718, 747, 730], \"label\": \"List-item\"}, {\"bbox_2d\": [209, 731, 664, 743], \"label\": \"List-item\"}, {\"bbox_2d\": [209, 745, 795, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [209, 655, 400, 666], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 459, 360, 471], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 395, 343, 407], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 333, 397, 345], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 301, 528, 313], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 777, 729, 789], \"label\": \"Section-header\"}, {\"bbox_2d\": [885, 974, 899, 982], \"label\": \"Page-footer\"}, {\"bbox_2d\": [737, 62, 902, 71], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [458, 72, 627, 86], \"label\": \"Text\"}, {\"bbox_2d\": [179, 363, 900, 377], \"label\": \"Text\"}, {\"bbox_2d\": [179, 426, 758, 439], \"label\": \"Text\"}, {\"bbox_2d\": [179, 488, 907, 545], \"label\": \"Text\"}, {\"bbox_2d\": [179, 562, 905, 604], \"label\": \"Text\"}, {\"bbox_2d\": [179, 622, 782, 636], \"label\": \"Text\"}, {\"bbox_2d\": [179, 685, 507, 698], \"label\": \"Text\"}, {\"bbox_2d\": [179, 807, 907, 835], \"label\": \"Text\"}, {\"bbox_2d\": [208, 853, 910, 895], \"label\": \"List-item\"}, {\"bbox_2d\": [208, 895, 911, 923], \"label\": \"List-item\"}, {\"bbox_2d\": [208, 716, 752, 730], \"label\": \"List-item\"}, {\"bbox_2d\": [208, 730, 668, 744], \"label\": \"List-item\"}, {\"bbox_2d\": [208, 744, 800, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [219, 776, 734, 789], \"label\": \"Section-header\"}, {\"bbox_2d\": [219, 653, 404, 667], \"label\": \"Section-header\"}, {\"bbox_2d\": [219, 457, 364, 471], \"label\": \"Section-header\"}, {\"bbox_2d\": [219, 394, 347, 408], \"label\": \"Section-header\"}, {\"bbox_2d\": [219, 331, 401, 345], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 299, 534, 314], \"label\": \"Section-header\"}, {\"bbox_2d\": [735, 60, 905, 71], \"label\": \"Page-header\"}, {\"bbox_2d\": [886, 973, 900, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [179, 131, 289, 173], \"label\": \"Formula\"}, {\"bbox_2d\": [112, 33, 290, 122], \"label\": \"Picture\"}, {\"bbox_2d\": [829, 83, 902, 121], \"label\": \"Picture\"}, {\"bbox_2d\": [179, 173, 803, 281], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 165], \"label\": \"Text\"}, {\"bbox_2d\": [121, 374, 911, 500], \"label\": \"Text\"}, {\"bbox_2d\": [121, 511, 911, 690], \"label\": \"Text\"}, {\"bbox_2d\": [121, 759, 911, 910], \"label\": \"Text\"}, {\"bbox_2d\": [121, 922, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [382, 705, 911, 747], \"label\": \"Formula\"}, {\"bbox_2d\": [320, 270, 911, 357], \"label\": \"Formula\"}, {\"bbox_2d\": [179, 180, 911, 220], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 911, 165], \"label\": \"Text\"}, {\"bbox_2d\": [121, 232, 171, 246], \"label\": \"Text\"}, {\"bbox_2d\": [121, 376, 911, 500], \"label\": \"Text\"}, {\"bbox_2d\": [121, 513, 911, 691], \"label\": \"Text\"}, {\"bbox_2d\": [121, 759, 911, 910], \"label\": \"Text\"}, {\"bbox_2d\": [121, 923, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [381, 706, 911, 742], \"label\": \"Formula\"}, {\"bbox_2d\": [176, 180, 911, 216], \"label\": \"Formula\"}, {\"bbox_2d\": [321, 272, 911, 352], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [79, 55, 359, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 548, 191, 570], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 591, 409, 962], \"label\": \"Table\"}, {\"bbox_2d\": [528, 591, 823, 930], \"label\": \"Table\"}, {\"bbox_2d\": [746, 212, 926, 295], \"label\": \"Caption\"}, {\"bbox_2d\": [114, 212, 272, 254], \"label\": \"Caption\"}, {\"bbox_2d\": [441, 212, 596, 237], \"label\": \"Caption\"}, {\"bbox_2d\": [753, 404, 948, 465], \"label\": \"Caption\"}, {\"bbox_2d\": [534, 404, 702, 491], \"label\": \"Caption\"}, {\"bbox_2d\": [300, 404, 488, 491], \"label\": \"Caption\"}, {\"bbox_2d\": [85, 404, 238, 505], \"label\": \"Caption\"}, {\"bbox_2d\": [108, 308, 231, 398], \"label\": \"Picture\"}, {\"bbox_2d\": [323, 308, 451, 398], \"label\": \"Picture\"}, {\"bbox_2d\": [552, 308, 677, 398], \"label\": \"Picture\"}, {\"bbox_2d\": [776, 308, 913, 398], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 421, 721, 491], \"label\": \"Caption\"}, {\"bbox_2d\": [85, 521, 242, 535], \"label\": \"Text\"}, {\"bbox_2d\": [269, 521, 414, 535], \"label\": \"Text\"}, {\"bbox_2d\": [324, 521, 502, 535], \"label\": \"Text\"}, {\"bbox_2d\": [742, 521, 961, 535], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [76, 57, 362, 84], \"label\": \"Section-header\"}, {\"bbox_2d\": [132, 112, 262, 206], \"label\": \"Picture\"}, {\"bbox_2d\": [454, 114, 580, 205], \"label\": \"Picture\"}, {\"bbox_2d\": [761, 114, 888, 203], \"label\": \"Picture\"}, {\"bbox_2d\": [730, 210, 924, 300], \"label\": \"Caption\"}, {\"bbox_2d\": [437, 209, 596, 243], \"label\": \"Caption\"}, {\"bbox_2d\": [116, 211, 271, 259], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 403, 244, 508], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 404, 490, 494], \"label\": \"Caption\"}, {\"bbox_2d\": [518, 403, 715, 493], \"label\": \"Caption\"}, {\"bbox_2d\": [752, 404, 944, 467], \"label\": \"Caption\"}, {\"bbox_2d\": [777, 309, 913, 400], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 309, 677, 400], \"label\": \"Picture\"}, {\"bbox_2d\": [326, 307, 448, 400], \"label\": \"Picture\"}, {\"bbox_2d\": [104, 305, 233, 400], \"label\": \"Picture\"}, {\"bbox_2d\": [78, 522, 962, 539], \"label\": \"Text\"}, {\"bbox_2d\": [75, 547, 194, 572], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 591, 320, 623], \"label\": \"Text\"}, {\"bbox_2d\": [81, 622, 323, 654], \"label\": \"Text\"}, {\"bbox_2d\": [81, 653, 351, 684], \"label\": \"Text\"}, {\"bbox_2d\": [81, 684, 392, 715], \"label\": \"Text\"}, {\"bbox_2d\": [81, 715, 409, 746], \"label\": \"Text\"}, {\"bbox_2d\": [81, 746, 337, 777], \"label\": \"Text\"}, {\"bbox_2d\": [81, 777, 368, 808], \"label\": \"Text\"}, {\"bbox_2d\": [81, 809, 369, 840], \"label\": \"Text\"}, {\"bbox_2d\": [81, 841, 262, 872], \"label\": \"Text\"}, {\"bbox_2d\": [81, 872, 206, 903], \"label\": \"Text\"}, {\"bbox_2d\": [81, 903, 463, 934], \"label\": \"Text\"}, {\"bbox_2d\": [81, 934, 355, 965], \"label\": \"Text\"}, {\"bbox_2d\": [530, 901, 812, 932], \"label\": \"Text\"}, {\"bbox_2d\": [530, 870, 799, 901], \"label\": \"Text\"}, {\"bbox_2d\": [530, 839, 755, 870], \"label\": \"Text\"}, {\"bbox_2d\": [530, 808, 810, 839], \"label\": \"Text\"}, {\"bbox_2d\": [530, 777, 809, 808], \"label\": \"Text\"}, {\"bbox_2d\": [530, 746, 750, 777], \"label\": \"Text\"}, {\"bbox_2d\": [530, 715, 818, 746], \"label\": \"Text\"}, {\"bbox_2d\": [530, 684, 765, 715], \"label\": \"Text\"}, {\"bbox_2d\": [530, 653, 800, 684], \"label\": \"Text\"}, {\"bbox_2d\": [530, 622, 818, 654], \"label\": \"Text\"}, {\"bbox_2d\": [530, 591, 742, 623], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [62, 55, 299, 86], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 98, 473, 323], \"label\": \"Text\"}, {\"bbox_2d\": [78, 325, 472, 661], \"label\": \"Text\"}, {\"bbox_2d\": [78, 662, 473, 972], \"label\": \"Text\"}, {\"bbox_2d\": [533, 98, 926, 229], \"label\": \"Text\"}, {\"bbox_2d\": [533, 231, 926, 644], \"label\": \"Text\"}, {\"bbox_2d\": [510, 645, 926, 972], \"label\": \"Text\"}, {\"bbox_2d\": [477, 981, 505, 997], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [64, 57, 312, 88], \"label\": \"Page-header\"}, {\"bbox_2d\": [531, 99, 931, 229], \"label\": \"Text\"}, {\"bbox_2d\": [79, 100, 485, 323], \"label\": \"Text\"}, {\"bbox_2d\": [531, 231, 937, 643], \"label\": \"Text\"}, {\"bbox_2d\": [79, 324, 486, 661], \"label\": \"Text\"}, {\"bbox_2d\": [509, 644, 936, 972], \"label\": \"Text\"}, {\"bbox_2d\": [78, 662, 485, 971], \"label\": \"Text\"}, {\"bbox_2d\": [476, 981, 509, 996], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 70, 283, 97], \"label\": \"Text\"}, {\"bbox_2d\": [91, 161, 506, 205], \"label\": \"Text\"}, {\"bbox_2d\": [91, 315, 506, 342], \"label\": \"Text\"}, {\"bbox_2d\": [536, 259, 951, 318], \"label\": \"Text\"}, {\"bbox_2d\": [536, 338, 951, 396], \"label\": \"Text\"}, {\"bbox_2d\": [536, 578, 951, 606], \"label\": \"Text\"}, {\"bbox_2d\": [91, 547, 506, 888], \"label\": \"Text\"}, {\"bbox_2d\": [536, 673, 951, 696], \"label\": \"Text\"}, {\"bbox_2d\": [91, 889, 506, 945], \"label\": \"Text\"}, {\"bbox_2d\": [536, 876, 847, 889], \"label\": \"Text\"}, {\"bbox_2d\": [536, 919, 951, 945], \"label\": \"Text\"}, {\"bbox_2d\": [536, 630, 936, 663], \"label\": \"Formula\"}, {\"bbox_2d\": [596, 716, 921, 825], \"label\": \"Formula\"}, {\"bbox_2d\": [596, 854, 895, 898], \"label\": \"Formula\"}, {\"bbox_2d\": [596, 965, 895, 998], \"label\": \"Formula\"}, {\"bbox_2d\": [596, 464, 921, 556], \"label\": \"Formula\"}, {\"bbox_2d\": [596, 338, 951, 427], \"label\": \"Formula\"}, {\"bbox_2d\": [568, 197, 951, 243], \"label\": \"Formula\"}, {\"bbox_2d\": [568, 247, 951, 299], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 148, 951, 183], \"label\": \"Text\"}, {\"bbox_2d\": [536, 70, 951, 167], \"label\": \"Text\"}, {\"bbox_2d\": [122, 354, 275, 377], \"label\": \"Formula\"}, {\"bbox_2d\": [122, 379, 506, 527], \"label\": \"Formula\"}, {\"bbox_2d\": [122, 219, 506, 305], \"label\": \"Formula\"}, {\"bbox_2d\": [178, 114, 506, 171], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 99, 951, 134], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [934, 38, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 72, 951, 143], \"label\": \"Text\"}, {\"bbox_2d\": [536, 149, 951, 176], \"label\": \"Text\"}, {\"bbox_2d\": [91, 72, 506, 101], \"label\": \"Text\"}, {\"bbox_2d\": [91, 162, 506, 203], \"label\": \"Text\"}, {\"bbox_2d\": [536, 258, 951, 319], \"label\": \"Text\"}, {\"bbox_2d\": [91, 316, 506, 342], \"label\": \"Text\"}, {\"bbox_2d\": [564, 192, 951, 213], \"label\": \"Formula\"}, {\"bbox_2d\": [564, 216, 951, 237], \"label\": \"Formula\"}, {\"bbox_2d\": [176, 111, 421, 151], \"label\": \"Formula\"}, {\"bbox_2d\": [108, 219, 489, 305], \"label\": \"Formula\"}, {\"bbox_2d\": [537, 334, 950, 560], \"label\": \"Formula\"}, {\"bbox_2d\": [123, 355, 473, 541], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 625, 963, 670], \"label\": \"Formula\"}, {\"bbox_2d\": [595, 720, 892, 829], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 551, 506, 886], \"label\": \"Text\"}, {\"bbox_2d\": [536, 581, 951, 608], \"label\": \"Text\"}, {\"bbox_2d\": [536, 673, 951, 700], \"label\": \"Text\"}, {\"bbox_2d\": [536, 850, 848, 862], \"label\": \"Text\"}, {\"bbox_2d\": [91, 889, 506, 946], \"label\": \"Text\"}, {\"bbox_2d\": [536, 919, 951, 946], \"label\": \"Text\"}, {\"bbox_2d\": [590, 883, 897, 895], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 49, 296, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [146, 128, 918, 171], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 178, 917, 258], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 260, 917, 324], \"label\": \"Text\"}, {\"bbox_2d\": [121, 327, 883, 375], \"label\": \"Text\"}, {\"bbox_2d\": [121, 378, 917, 474], \"label\": \"Text\"}, {\"bbox_2d\": [121, 477, 917, 524], \"label\": \"Text\"}, {\"bbox_2d\": [121, 527, 859, 555], \"label\": \"Text\"}, {\"bbox_2d\": [121, 658, 618, 672], \"label\": \"Section-header\"}, {\"bbox_2d\": [146, 558, 917, 606], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 609, 917, 656], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 675, 917, 755], \"label\": \"Text\"}, {\"bbox_2d\": [184, 758, 876, 786], \"label\": \"Text\"}, {\"bbox_2d\": [146, 791, 909, 855], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 858, 899, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 907, 917, 936], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 974, 440, 987], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 126, 931, 176], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 176, 951, 258], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 557, 935, 606], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 606, 942, 655], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 788, 941, 854], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 855, 929, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 904, 954, 937], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 259, 942, 325], \"label\": \"Text\"}, {\"bbox_2d\": [121, 325, 908, 374], \"label\": \"Text\"}, {\"bbox_2d\": [121, 375, 955, 474], \"label\": \"Text\"}, {\"bbox_2d\": [121, 474, 922, 523], \"label\": \"Text\"}, {\"bbox_2d\": [121, 523, 884, 556], \"label\": \"Text\"}, {\"bbox_2d\": [121, 672, 954, 755], \"label\": \"Text\"}, {\"bbox_2d\": [121, 755, 909, 788], \"label\": \"Text\"}, {\"bbox_2d\": [121, 656, 639, 672], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [83, 74, 510, 297], \"label\": \"Text\"}, {\"bbox_2d\": [83, 418, 510, 980], \"label\": \"Text\"}, {\"bbox_2d\": [528, 165, 957, 728], \"label\": \"Text\"}, {\"bbox_2d\": [528, 757, 957, 980], \"label\": \"Text\"}, {\"bbox_2d\": [946, 38, 953, 48], \"label\": \"Page-header\"}, {\"bbox_2d\": [528, 75, 640, 88], \"label\": \"Section-header\"}, {\"bbox_2d\": [97, 347, 492, 386], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [947, 39, 953, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [528, 76, 639, 87], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 349, 492, 387], \"label\": \"Section-header\"}, {\"bbox_2d\": [82, 76, 507, 296], \"label\": \"Text\"}, {\"bbox_2d\": [82, 420, 507, 979], \"label\": \"Text\"}, {\"bbox_2d\": [528, 168, 953, 727], \"label\": \"Text\"}, {\"bbox_2d\": [528, 759, 953, 979], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 926, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 543, 658, 556], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 567, 893, 595], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 507, 874, 534], \"label\": \"Text\"}, {\"bbox_2d\": [226, 614, 928, 765], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 349, 626, 359], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 396, 895, 440], \"label\": \"Text\"}, {\"bbox_2d\": [109, 474, 909, 489], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 769, 640, 780], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 699, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 140, 697, 340], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 926, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 567, 892, 595], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 545, 657, 557], \"label\": \"List-item\"}, {\"bbox_2d\": [230, 139, 698, 341], \"label\": \"Picture\"}, {\"bbox_2d\": [228, 612, 927, 764], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 768, 639, 779], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 347, 625, 358], \"label\": \"Caption\"}, {\"bbox_2d\": [241, 396, 893, 439], \"label\": \"Text\"}, {\"bbox_2d\": [231, 507, 874, 535], \"label\": \"Text\"}, {\"bbox_2d\": [109, 473, 904, 488], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [59, 722, 950, 768], \"label\": \"Text\"}, {\"bbox_2d\": [59, 788, 950, 864], \"label\": \"Text\"}, {\"bbox_2d\": [59, 854, 950, 888], \"label\": \"Text\"}, {\"bbox_2d\": [130, 89, 621, 656], \"label\": \"Picture\"}, {\"bbox_2d\": [327, 684, 379, 698], \"label\": \"Caption\"}, {\"bbox_2d\": [772, 26, 950, 44], \"label\": \"Page-header\"}, {\"bbox_2d\": [485, 994, 528, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [774, 29, 952, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [58, 724, 975, 769], \"label\": \"Text\"}, {\"bbox_2d\": [58, 790, 972, 836], \"label\": \"Text\"}, {\"bbox_2d\": [58, 857, 976, 886], \"label\": \"Text\"}, {\"bbox_2d\": [485, 994, 529, 1006], \"label\": \"Page-footer\"}, {\"bbox_2d\": [132, 93, 614, 696], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [189, 23, 961, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [526, 82, 898, 299], \"label\": \"Text\"}, {\"bbox_2d\": [526, 308, 928, 544], \"label\": \"Text\"}, {\"bbox_2d\": [526, 551, 936, 730], \"label\": \"Text\"}, {\"bbox_2d\": [526, 737, 920, 918], \"label\": \"Text\"}, {\"bbox_2d\": [78, 83, 456, 168], \"label\": \"Text\"}, {\"bbox_2d\": [78, 176, 460, 469], \"label\": \"Text\"}, {\"bbox_2d\": [78, 477, 491, 711], \"label\": \"Text\"}, {\"bbox_2d\": [78, 757, 451, 881], \"label\": \"Text\"}, {\"bbox_2d\": [78, 889, 434, 918], \"label\": \"Text\"}, {\"bbox_2d\": [78, 737, 402, 748], \"label\": \"Section-header\"}, {\"bbox_2d\": [482, 948, 554, 1005], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [301, 21, 753, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 79, 447, 168], \"label\": \"Text\"}, {\"bbox_2d\": [78, 172, 461, 468], \"label\": \"Text\"}, {\"bbox_2d\": [78, 472, 491, 712], \"label\": \"Text\"}, {\"bbox_2d\": [78, 753, 449, 880], \"label\": \"Text\"}, {\"bbox_2d\": [78, 884, 438, 918], \"label\": \"Text\"}, {\"bbox_2d\": [525, 734, 918, 918], \"label\": \"Text\"}, {\"bbox_2d\": [525, 547, 939, 731], \"label\": \"Text\"}, {\"bbox_2d\": [525, 303, 931, 543], \"label\": \"Text\"}, {\"bbox_2d\": [525, 79, 962, 300], \"label\": \"Text\"}, {\"bbox_2d\": [78, 734, 402, 750], \"label\": \"Section-header\"}, {\"bbox_2d\": [479, 945, 556, 1011], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [341, 43, 958, 74], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 99, 458, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [117, 117, 877, 245], \"label\": \"Table\"}, {\"bbox_2d\": [117, 263, 525, 305], \"label\": \"Text\"}, {\"bbox_2d\": [116, 300, 877, 661], \"label\": \"Table\"}, {\"bbox_2d\": [115, 671, 902, 728], \"label\": \"Text\"}, {\"bbox_2d\": [115, 747, 922, 833], \"label\": \"Text\"}, {\"bbox_2d\": [115, 852, 889, 894], \"label\": \"Text\"}, {\"bbox_2d\": [115, 910, 907, 983], \"label\": \"Text\"}, {\"bbox_2d\": [781, 990, 974, 1001], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [340, 43, 958, 74], \"label\": \"Page-header\"}, {\"bbox_2d\": [112, 113, 877, 658], \"label\": \"Table\"}, {\"bbox_2d\": [114, 98, 459, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [114, 670, 916, 727], \"label\": \"Text\"}, {\"bbox_2d\": [114, 745, 923, 832], \"label\": \"Text\"}, {\"bbox_2d\": [114, 850, 888, 892], \"label\": \"Text\"}, {\"bbox_2d\": [114, 910, 908, 983], \"label\": \"Text\"}, {\"bbox_2d\": [781, 990, 974, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 62, 1009, 105], \"label\": \"Text\"}, {\"bbox_2d\": [25, 114, 1009, 148], \"label\": \"Text\"}, {\"bbox_2d\": [25, 156, 1009, 199], \"label\": \"Text\"}, {\"bbox_2d\": [25, 207, 1009, 227], \"label\": \"Text\"}, {\"bbox_2d\": [25, 235, 1009, 269], \"label\": \"Text\"}, {\"bbox_2d\": [25, 277, 1009, 320], \"label\": \"Text\"}, {\"bbox_2d\": [25, 328, 1009, 418], \"label\": \"Text\"}, {\"bbox_2d\": [25, 426, 1009, 446], \"label\": \"Text\"}, {\"bbox_2d\": [25, 454, 1009, 554], \"label\": \"Text\"}, {\"bbox_2d\": [508, 589, 525, 599], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 60, 1009, 105], \"label\": \"Text\"}, {\"bbox_2d\": [25, 113, 1009, 146], \"label\": \"Text\"}, {\"bbox_2d\": [25, 153, 1009, 198], \"label\": \"Text\"}, {\"bbox_2d\": [25, 205, 1009, 227], \"label\": \"Text\"}, {\"bbox_2d\": [25, 235, 1009, 268], \"label\": \"Text\"}, {\"bbox_2d\": [25, 275, 1009, 320], \"label\": \"Text\"}, {\"bbox_2d\": [25, 327, 1009, 417], \"label\": \"Text\"}, {\"bbox_2d\": [25, 425, 1009, 447], \"label\": \"Text\"}, {\"bbox_2d\": [25, 454, 1009, 555], \"label\": \"Text\"}, {\"bbox_2d\": [508, 587, 525, 598], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 115, 1009, 132], \"label\": \"Text\"}, {\"bbox_2d\": [25, 157, 1009, 180], \"label\": \"Text\"}, {\"bbox_2d\": [25, 292, 1009, 310], \"label\": \"Text\"}, {\"bbox_2d\": [25, 338, 1009, 368], \"label\": \"Text\"}, {\"bbox_2d\": [25, 381, 1009, 400], \"label\": \"Text\"}, {\"bbox_2d\": [25, 413, 1000, 423], \"label\": \"Text\"}, {\"bbox_2d\": [25, 436, 966, 446], \"label\": \"Text\"}, {\"bbox_2d\": [25, 474, 1009, 491], \"label\": \"Text\"}, {\"bbox_2d\": [25, 526, 1009, 544], \"label\": \"Text\"}, {\"bbox_2d\": [25, 460, 348, 471], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 509, 300, 520], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 147, 323, 158], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 325, 242, 336], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 186, 1009, 274], \"label\": \"Table\"}, {\"bbox_2d\": [861, 27, 1009, 36], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [861, 24, 1009, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 146, 323, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 324, 242, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 460, 349, 472], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 509, 301, 521], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 524, 1009, 544], \"label\": \"Text\"}, {\"bbox_2d\": [25, 472, 1009, 492], \"label\": \"Text\"}, {\"bbox_2d\": [25, 413, 1000, 423], \"label\": \"Text\"}, {\"bbox_2d\": [25, 380, 1009, 400], \"label\": \"Text\"}, {\"bbox_2d\": [25, 437, 967, 447], \"label\": \"Text\"}, {\"bbox_2d\": [25, 337, 1009, 367], \"label\": \"Text\"}, {\"bbox_2d\": [25, 291, 1009, 311], \"label\": \"Text\"}, {\"bbox_2d\": [25, 159, 1009, 179], \"label\": \"Text\"}, {\"bbox_2d\": [25, 114, 1009, 133], \"label\": \"Text\"}, {\"bbox_2d\": [22, 182, 1009, 274], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [255, 93, 888, 121], \"label\": \"Text\"}, {\"bbox_2d\": [255, 131, 892, 160], \"label\": \"Text\"}, {\"bbox_2d\": [255, 607, 918, 651], \"label\": \"Text\"}, {\"bbox_2d\": [255, 817, 926, 845], \"label\": \"Text\"}, {\"bbox_2d\": [255, 749, 526, 757], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 574, 516, 584], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 779, 905, 807], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 855, 926, 883], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 987, 698, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [255, 177, 927, 570], \"label\": \"Picture\"}, {\"bbox_2d\": [254, 672, 924, 739], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [255, 93, 888, 121], \"label\": \"Text\"}, {\"bbox_2d\": [255, 131, 892, 159], \"label\": \"Text\"}, {\"bbox_2d\": [255, 608, 917, 651], \"label\": \"Text\"}, {\"bbox_2d\": [255, 817, 926, 844], \"label\": \"Text\"}, {\"bbox_2d\": [231, 855, 926, 882], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 779, 903, 806], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 988, 694, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [255, 745, 526, 756], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 574, 516, 584], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 176, 929, 569], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 672, 925, 738], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [199, 455, 872, 597], \"label\": \"Table\"}, {\"bbox_2d\": [112, 944, 131, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [198, 383, 467, 399], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 959, 142, 1017], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [198, 386, 470, 399], \"label\": \"Section-header\"}, {\"bbox_2d\": [199, 457, 875, 597], \"label\": \"Table\"}, {\"bbox_2d\": [113, 948, 134, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [40, 961, 146, 1018], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 83, 502, 125], \"label\": \"Text\"}, {\"bbox_2d\": [121, 139, 482, 180], \"label\": \"Text\"}, {\"bbox_2d\": [121, 194, 480, 264], \"label\": \"Text\"}, {\"bbox_2d\": [121, 278, 507, 304], \"label\": \"Text\"}, {\"bbox_2d\": [121, 318, 502, 387], \"label\": \"Text\"}, {\"bbox_2d\": [121, 401, 507, 458], \"label\": \"Text\"}, {\"bbox_2d\": [121, 472, 490, 512], \"label\": \"Text\"}, {\"bbox_2d\": [121, 552, 504, 607], \"label\": \"Text\"}, {\"bbox_2d\": [121, 621, 492, 661], \"label\": \"Text\"}, {\"bbox_2d\": [121, 675, 482, 686], \"label\": \"Text\"}, {\"bbox_2d\": [121, 700, 503, 773], \"label\": \"Text\"}, {\"bbox_2d\": [121, 787, 444, 797], \"label\": \"Text\"}, {\"bbox_2d\": [121, 812, 448, 823], \"label\": \"Text\"}, {\"bbox_2d\": [121, 838, 484, 864], \"label\": \"Text\"}, {\"bbox_2d\": [121, 879, 505, 935], \"label\": \"Text\"}, {\"bbox_2d\": [547, 84, 926, 157], \"label\": \"Text\"}, {\"bbox_2d\": [547, 169, 885, 195], \"label\": \"Text\"}, {\"bbox_2d\": [547, 208, 922, 248], \"label\": \"Text\"}, {\"bbox_2d\": [547, 262, 919, 304], \"label\": \"Text\"}, {\"bbox_2d\": [547, 318, 927, 344], \"label\": \"Text\"}, {\"bbox_2d\": [547, 358, 928, 442], \"label\": \"Text\"}, {\"bbox_2d\": [547, 456, 922, 497], \"label\": \"Text\"}, {\"bbox_2d\": [547, 511, 889, 537], \"label\": \"Text\"}, {\"bbox_2d\": [547, 551, 908, 607], \"label\": \"Text\"}, {\"bbox_2d\": [547, 621, 930, 647], \"label\": \"Text\"}, {\"bbox_2d\": [547, 661, 901, 701], \"label\": \"Text\"}, {\"bbox_2d\": [547, 715, 903, 785], \"label\": \"Text\"}, {\"bbox_2d\": [547, 800, 731, 811], \"label\": \"Text\"}, {\"bbox_2d\": [547, 825, 907, 836], \"label\": \"Text\"}, {\"bbox_2d\": [547, 850, 929, 877], \"label\": \"Text\"}, {\"bbox_2d\": [547, 892, 714, 903], \"label\": \"Text\"}, {\"bbox_2d\": [547, 917, 874, 929], \"label\": \"Text\"}, {\"bbox_2d\": [831, 979, 931, 990], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 85, 501, 125], \"label\": \"Text\"}, {\"bbox_2d\": [121, 140, 480, 180], \"label\": \"Text\"}, {\"bbox_2d\": [121, 194, 501, 264], \"label\": \"Text\"}, {\"bbox_2d\": [121, 278, 506, 304], \"label\": \"Text\"}, {\"bbox_2d\": [121, 318, 501, 388], \"label\": \"Text\"}, {\"bbox_2d\": [121, 402, 506, 457], \"label\": \"Text\"}, {\"bbox_2d\": [121, 471, 491, 512], \"label\": \"Text\"}, {\"bbox_2d\": [121, 526, 313, 538], \"label\": \"Text\"}, {\"bbox_2d\": [121, 552, 505, 607], \"label\": \"Text\"}, {\"bbox_2d\": [121, 621, 493, 662], \"label\": \"Text\"}, {\"bbox_2d\": [121, 676, 483, 688], \"label\": \"Text\"}, {\"bbox_2d\": [121, 702, 504, 772], \"label\": \"Text\"}, {\"bbox_2d\": [121, 786, 445, 798], \"label\": \"Text\"}, {\"bbox_2d\": [121, 812, 448, 824], \"label\": \"Text\"}, {\"bbox_2d\": [121, 838, 484, 864], \"label\": \"Text\"}, {\"bbox_2d\": [121, 878, 505, 933], \"label\": \"Text\"}, {\"bbox_2d\": [547, 919, 874, 931], \"label\": \"Text\"}, {\"bbox_2d\": [547, 893, 715, 905], \"label\": \"Text\"}, {\"bbox_2d\": [547, 852, 930, 879], \"label\": \"Text\"}, {\"bbox_2d\": [547, 826, 905, 838], \"label\": \"Text\"}, {\"bbox_2d\": [547, 800, 730, 812], \"label\": \"Text\"}, {\"bbox_2d\": [547, 717, 901, 786], \"label\": \"Text\"}, {\"bbox_2d\": [547, 662, 901, 702], \"label\": \"Text\"}, {\"bbox_2d\": [547, 621, 930, 648], \"label\": \"Text\"}, {\"bbox_2d\": [547, 552, 910, 607], \"label\": \"Text\"}, {\"bbox_2d\": [547, 512, 889, 538], \"label\": \"Text\"}, {\"bbox_2d\": [547, 457, 921, 497], \"label\": \"Text\"}, {\"bbox_2d\": [547, 359, 927, 443], \"label\": \"Text\"}, {\"bbox_2d\": [547, 318, 926, 345], \"label\": \"Text\"}, {\"bbox_2d\": [547, 264, 919, 304], \"label\": \"Text\"}, {\"bbox_2d\": [547, 209, 922, 249], \"label\": \"Text\"}, {\"bbox_2d\": [547, 168, 886, 195], \"label\": \"Text\"}, {\"bbox_2d\": [547, 85, 926, 154], \"label\": \"Text\"}, {\"bbox_2d\": [831, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [537, 304, 740, 319], \"label\": \"Section-header\"}, {\"bbox_2d\": [82, 327, 279, 341], \"label\": \"Section-header\"}, {\"bbox_2d\": [537, 328, 953, 486], \"label\": \"Text\"}, {\"bbox_2d\": [82, 349, 491, 486], \"label\": \"Text\"}, {\"bbox_2d\": [82, 776, 526, 868], \"label\": \"Text\"}, {\"bbox_2d\": [82, 750, 245, 764], \"label\": \"Section-header\"}, {\"bbox_2d\": [822, 1004, 954, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [82, 910, 164, 972], \"label\": \"Picture\"}, {\"bbox_2d\": [353, 923, 425, 962], \"label\": \"Picture\"}, {\"bbox_2d\": [612, 910, 670, 981], \"label\": \"Picture\"}, {\"bbox_2d\": [776, 729, 922, 807], \"label\": \"Picture\"}, {\"bbox_2d\": [550, 814, 695, 873], \"label\": \"Picture\"}, {\"bbox_2d\": [712, 786, 861, 849], \"label\": \"Picture\"}, {\"bbox_2d\": [879, 735, 910, 750], \"label\": \"Picture\"}, {\"bbox_2d\": [537, 524, 948, 712], \"label\": \"Picture\"}, {\"bbox_2d\": [0, 524, 521, 714], \"label\": \"Picture\"}, {\"bbox_2d\": [55, 92, 950, 191], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [538, 306, 740, 319], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 327, 278, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 350, 491, 486], \"label\": \"Text\"}, {\"bbox_2d\": [538, 329, 952, 486], \"label\": \"Text\"}, {\"bbox_2d\": [81, 773, 526, 867], \"label\": \"Text\"}, {\"bbox_2d\": [81, 750, 244, 763], \"label\": \"Section-header\"}, {\"bbox_2d\": [545, 724, 924, 891], \"label\": \"Picture\"}, {\"bbox_2d\": [603, 910, 674, 975], \"label\": \"Picture\"}, {\"bbox_2d\": [854, 927, 957, 956], \"label\": \"Picture\"}, {\"bbox_2d\": [348, 922, 427, 962], \"label\": \"Picture\"}, {\"bbox_2d\": [74, 911, 161, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [0, 524, 1035, 715], \"label\": \"Picture\"}, {\"bbox_2d\": [57, 93, 950, 190], \"label\": \"List-item\"}, {\"bbox_2d\": [822, 1005, 924, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [941, 1005, 955, 1013], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [946, 38, 953, 48], \"label\": \"Page-header\"}, {\"bbox_2d\": [82, 75, 508, 374], \"label\": \"Text\"}, {\"bbox_2d\": [82, 389, 508, 818], \"label\": \"Text\"}, {\"bbox_2d\": [528, 435, 953, 501], \"label\": \"Text\"}, {\"bbox_2d\": [528, 518, 953, 741], \"label\": \"Text\"}, {\"bbox_2d\": [82, 889, 508, 980], \"label\": \"Text\"}, {\"bbox_2d\": [528, 758, 953, 980], \"label\": \"Text\"}, {\"bbox_2d\": [82, 857, 329, 868], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 381, 892, 391], \"label\": \"Caption\"}, {\"bbox_2d\": [544, 82, 930, 358], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [947, 39, 953, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [82, 76, 507, 374], \"label\": \"Text\"}, {\"bbox_2d\": [82, 389, 507, 818], \"label\": \"Text\"}, {\"bbox_2d\": [82, 889, 507, 979], \"label\": \"Text\"}, {\"bbox_2d\": [528, 436, 953, 500], \"label\": \"Text\"}, {\"bbox_2d\": [528, 519, 953, 739], \"label\": \"Text\"}, {\"bbox_2d\": [528, 759, 953, 979], \"label\": \"Text\"}, {\"bbox_2d\": [82, 857, 328, 868], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 381, 892, 390], \"label\": \"Caption\"}, {\"bbox_2d\": [544, 81, 931, 361], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [654, 37, 898, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 37, 319, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 69, 898, 98], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 119, 898, 201], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 221, 898, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 272, 898, 319], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 340, 898, 371], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 528, 898, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 613, 898, 644], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 665, 898, 695], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 716, 898, 747], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 768, 386, 783], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 803, 898, 852], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 872, 898, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 973, 549, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [202, 392, 851, 405], \"label\": \"List-item\"}, {\"bbox_2d\": [202, 425, 898, 456], \"label\": \"List-item\"}, {\"bbox_2d\": [202, 476, 898, 507], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [138, 35, 322, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [654, 35, 903, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 65, 902, 98], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 116, 903, 200], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 218, 903, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 269, 902, 320], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 337, 902, 371], \"label\": \"List-item\"}, {\"bbox_2d\": [199, 389, 857, 405], \"label\": \"List-item\"}, {\"bbox_2d\": [199, 423, 902, 456], \"label\": \"List-item\"}, {\"bbox_2d\": [199, 474, 902, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 525, 903, 559], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 577, 823, 593], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 611, 902, 644], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 662, 902, 695], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 713, 903, 746], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 765, 392, 784], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 801, 903, 852], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 869, 903, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [534, 972, 554, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [630, 122, 960, 135], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 164, 962, 209], \"label\": \"Text\"}, {\"bbox_2d\": [125, 266, 905, 375], \"label\": \"Text\"}, {\"bbox_2d\": [125, 392, 948, 499], \"label\": \"Text\"}, {\"bbox_2d\": [125, 518, 904, 532], \"label\": \"Text\"}, {\"bbox_2d\": [458, 988, 631, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [867, 988, 960, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [125, 975, 191, 998], \"label\": \"Picture\"}, {\"bbox_2d\": [139, 56, 372, 114], \"label\": \"Picture\"}, {\"bbox_2d\": [139, 241, 518, 259], \"label\": \"Section-header\"}, {\"bbox_2d\": [420, 282, 667, 374], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [142, 53, 376, 121], \"label\": \"Picture\"}, {\"bbox_2d\": [628, 122, 957, 134], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 162, 967, 209], \"label\": \"Text\"}, {\"bbox_2d\": [125, 265, 912, 280], \"label\": \"Text\"}, {\"bbox_2d\": [399, 280, 724, 375], \"label\": \"Text\"}, {\"bbox_2d\": [125, 391, 954, 499], \"label\": \"Text\"}, {\"bbox_2d\": [125, 517, 905, 532], \"label\": \"Text\"}, {\"bbox_2d\": [138, 239, 526, 261], \"label\": \"Section-header\"}, {\"bbox_2d\": [458, 987, 631, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [867, 987, 966, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [137, 974, 193, 1001], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [934, 37, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 951, 126], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 129, 836, 140], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 143, 854, 154], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 157, 951, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 196, 951, 221], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 224, 951, 249], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 252, 951, 277], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 280, 951, 305], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 308, 951, 344], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 347, 951, 372], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 375, 951, 400], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 403, 951, 428], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 431, 951, 456], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 459, 951, 484], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 487, 951, 523], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 526, 951, 551], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 554, 951, 579], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 582, 951, 607], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 610, 951, 635], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 638, 506, 674], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 677, 506, 702], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 705, 506, 729], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 732, 506, 757], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 760, 506, 796], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 689, 951, 725], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 728, 951, 749], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 752, 951, 777], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 780, 951, 805], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 71, 506, 96], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 100, 506, 126], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 129, 506, 154], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 157, 506, 182], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 185, 506, 209], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 212, 506, 237], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 240, 506, 265], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 268, 506, 293], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 296, 506, 344], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 347, 506, 372], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 375, 506, 400], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 403, 506, 428], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 431, 506, 456], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 459, 506, 495], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 498, 506, 523], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 526, 506, 562], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 565, 506, 588], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 72, 506, 97], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 99, 505, 125], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 127, 506, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 154, 506, 179], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 181, 506, 207], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 209, 506, 234], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 236, 506, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 263, 506, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 72, 951, 111], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 113, 835, 125], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 127, 951, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 154, 875, 166], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 168, 951, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 195, 951, 220], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 222, 951, 248], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 250, 951, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 277, 951, 303], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 304, 951, 346], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 347, 951, 373], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 291, 506, 316], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 318, 506, 344], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 345, 505, 371], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 373, 410, 385], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 387, 506, 412], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 375, 895, 387], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 388, 951, 414], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 414, 506, 439], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 416, 951, 441], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 443, 951, 469], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 441, 506, 467], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 469, 505, 494], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 471, 951, 496], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 498, 951, 523], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 496, 506, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 525, 951, 551], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 553, 951, 578], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 580, 951, 606], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 537, 506, 562], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 564, 506, 590], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 592, 506, 617], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 607, 951, 633], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 635, 951, 660], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 619, 506, 645], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 647, 437, 658], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 660, 506, 686], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 24, 351, 38], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 92, 348, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 215, 454, 232], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 122, 944, 149], \"label\": \"Text\"}, {\"bbox_2d\": [91, 157, 944, 195], \"label\": \"Text\"}, {\"bbox_2d\": [91, 237, 950, 717], \"label\": \"Table\"}, {\"bbox_2d\": [91, 722, 944, 760], \"label\": \"Text\"}, {\"bbox_2d\": [91, 758, 320, 769], \"label\": \"Text\"}, {\"bbox_2d\": [91, 998, 445, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 351, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 348, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 217, 453, 232], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 124, 944, 148], \"label\": \"Text\"}, {\"bbox_2d\": [91, 158, 944, 197], \"label\": \"Text\"}, {\"bbox_2d\": [91, 726, 944, 757], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 761, 320, 769], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 1000, 444, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [94, 238, 949, 718], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [51, 29, 288, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [51, 57, 700, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [298, 173, 800, 257], \"label\": \"Text\"}, {\"bbox_2d\": [51, 343, 262, 561], \"label\": \"Text\"}, {\"bbox_2d\": [51, 574, 258, 696], \"label\": \"Text\"}, {\"bbox_2d\": [51, 709, 261, 835], \"label\": \"Text\"}, {\"bbox_2d\": [51, 843, 261, 983], \"label\": \"Text\"}, {\"bbox_2d\": [292, 924, 503, 964], \"label\": \"Text\"}, {\"bbox_2d\": [292, 843, 503, 891], \"label\": \"Text\"}, {\"bbox_2d\": [528, 843, 729, 950], \"label\": \"Text\"}, {\"bbox_2d\": [528, 609, 738, 810], \"label\": \"Text\"}, {\"bbox_2d\": [528, 501, 740, 596], \"label\": \"Text\"}, {\"bbox_2d\": [528, 378, 739, 472], \"label\": \"Text\"}, {\"bbox_2d\": [528, 338, 728, 366], \"label\": \"Text\"}, {\"bbox_2d\": [528, 441, 969, 582], \"label\": \"Text\"}, {\"bbox_2d\": [768, 338, 971, 427], \"label\": \"Text\"}, {\"bbox_2d\": [768, 593, 981, 715], \"label\": \"Text\"}, {\"bbox_2d\": [768, 730, 968, 852], \"label\": \"Text\"}, {\"bbox_2d\": [768, 864, 968, 891], \"label\": \"Text\"}, {\"bbox_2d\": [768, 899, 973, 987], \"label\": \"Text\"}, {\"bbox_2d\": [292, 906, 452, 916], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 338, 461, 365], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 370, 500, 574], \"label\": \"Text\"}, {\"bbox_2d\": [292, 587, 492, 760], \"label\": \"Text\"}, {\"bbox_2d\": [292, 799, 492, 891], \"label\": \"Text\"}, {\"bbox_2d\": [528, 824, 656, 835], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 961, 730, 987], \"label\": \"Text\"}, {\"bbox_2d\": [49, 1004, 222, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [49, 1006, 112, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [49, 1007, 112, 1016], \"label\": \"Page-footer\"}, {\"bbox_2d\": [50, 168, 267, 325], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [51, 28, 288, 43], \"label\": \"Page-header\"}, {\"bbox_2d\": [51, 59, 717, 119], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 172, 810, 257], \"label\": \"Text\"}, {\"bbox_2d\": [53, 343, 262, 561], \"label\": \"Text\"}, {\"bbox_2d\": [292, 371, 500, 573], \"label\": \"Text\"}, {\"bbox_2d\": [530, 338, 727, 365], \"label\": \"Text\"}, {\"bbox_2d\": [530, 378, 743, 596], \"label\": \"Text\"}, {\"bbox_2d\": [769, 338, 975, 428], \"label\": \"Text\"}, {\"bbox_2d\": [769, 442, 972, 580], \"label\": \"Text\"}, {\"bbox_2d\": [53, 574, 257, 696], \"label\": \"Text\"}, {\"bbox_2d\": [292, 586, 493, 788], \"label\": \"Text\"}, {\"bbox_2d\": [53, 710, 265, 832], \"label\": \"Text\"}, {\"bbox_2d\": [53, 845, 264, 983], \"label\": \"Text\"}, {\"bbox_2d\": [292, 921, 503, 964], \"label\": \"Text\"}, {\"bbox_2d\": [530, 961, 731, 987], \"label\": \"Text\"}, {\"bbox_2d\": [530, 841, 734, 947], \"label\": \"Text\"}, {\"bbox_2d\": [530, 609, 742, 811], \"label\": \"Text\"}, {\"bbox_2d\": [769, 593, 980, 715], \"label\": \"Text\"}, {\"bbox_2d\": [769, 729, 961, 851], \"label\": \"Text\"}, {\"bbox_2d\": [769, 864, 975, 987], \"label\": \"Text\"}, {\"bbox_2d\": [119, 1004, 222, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [38, 1003, 44, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [64, 1001, 117, 1015], \"label\": \"Picture\"}, {\"bbox_2d\": [292, 801, 487, 892], \"label\": \"Text\"}, {\"bbox_2d\": [530, 825, 657, 836], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 906, 451, 916], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 338, 462, 365], \"label\": \"Section-header\"}, {\"bbox_2d\": [50, 168, 265, 324], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 536, 110], \"label\": \"Text\"}, {\"bbox_2d\": [121, 122, 911, 166], \"label\": \"Text\"}, {\"bbox_2d\": [147, 177, 910, 192], \"label\": \"Text\"}, {\"bbox_2d\": [121, 204, 911, 247], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 95, 536, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 118, 910, 165], \"label\": \"Text\"}, {\"bbox_2d\": [147, 177, 910, 193], \"label\": \"Text\"}, {\"bbox_2d\": [121, 204, 911, 248], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 583, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 113, 898, 172], \"label\": \"Text\"}, {\"bbox_2d\": [231, 188, 904, 232], \"label\": \"Text\"}, {\"bbox_2d\": [231, 242, 921, 285], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 547, 589, 559], \"label\": \"Caption\"}, {\"bbox_2d\": [758, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [230, 304, 915, 544], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 584, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 113, 903, 172], \"label\": \"Text\"}, {\"bbox_2d\": [231, 189, 907, 232], \"label\": \"Text\"}, {\"bbox_2d\": [231, 242, 926, 286], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 303, 916, 546], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 547, 593, 558], \"label\": \"Caption\"}, {\"bbox_2d\": [758, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 150, 112], \"label\": \"Text\"}, {\"bbox_2d\": [121, 246, 264, 261], \"label\": \"Text\"}, {\"bbox_2d\": [121, 318, 150, 334], \"label\": \"Text\"}, {\"bbox_2d\": [121, 368, 688, 411], \"label\": \"Text\"}, {\"bbox_2d\": [121, 497, 911, 618], \"label\": \"Text\"}, {\"bbox_2d\": [121, 649, 811, 692], \"label\": \"Text\"}, {\"bbox_2d\": [121, 753, 711, 771], \"label\": \"Text\"}, {\"bbox_2d\": [121, 829, 409, 845], \"label\": \"Text\"}, {\"bbox_2d\": [121, 905, 811, 922], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [284, 698, 911, 764], \"label\": \"Formula\"}, {\"bbox_2d\": [378, 787, 911, 845], \"label\": \"Formula\"}, {\"bbox_2d\": [435, 865, 911, 883], \"label\": \"Formula\"}, {\"bbox_2d\": [284, 575, 911, 649], \"label\": \"Formula\"}, {\"bbox_2d\": [146, 450, 385, 463], \"label\": \"Section-header\"}, {\"bbox_2d\": [275, 271, 911, 322], \"label\": \"Formula\"}, {\"bbox_2d\": [430, 348, 911, 400], \"label\": \"Formula\"}, {\"bbox_2d\": [146, 169, 536, 186], \"label\": \"Text\"}, {\"bbox_2d\": [410, 123, 911, 164], \"label\": \"Formula\"}, {\"bbox_2d\": [408, 199, 911, 236], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 151, 110], \"label\": \"Text\"}, {\"bbox_2d\": [147, 171, 562, 185], \"label\": \"Text\"}, {\"bbox_2d\": [121, 246, 262, 261], \"label\": \"Text\"}, {\"bbox_2d\": [121, 322, 151, 336], \"label\": \"Text\"}, {\"bbox_2d\": [121, 397, 685, 411], \"label\": \"Text\"}, {\"bbox_2d\": [121, 495, 911, 589], \"label\": \"Text\"}, {\"bbox_2d\": [121, 649, 911, 690], \"label\": \"Text\"}, {\"bbox_2d\": [121, 751, 710, 772], \"label\": \"Text\"}, {\"bbox_2d\": [121, 829, 409, 843], \"label\": \"Text\"}, {\"bbox_2d\": [121, 904, 840, 919], \"label\": \"Text\"}, {\"bbox_2d\": [439, 124, 910, 158], \"label\": \"Formula\"}, {\"bbox_2d\": [406, 195, 910, 242], \"label\": \"Formula\"}, {\"bbox_2d\": [276, 270, 910, 318], \"label\": \"Formula\"}, {\"bbox_2d\": [430, 350, 910, 384], \"label\": \"Formula\"}, {\"bbox_2d\": [346, 597, 910, 644], \"label\": \"Formula\"}, {\"bbox_2d\": [398, 699, 910, 746], \"label\": \"Formula\"}, {\"bbox_2d\": [386, 789, 910, 808], \"label\": \"Formula\"}, {\"bbox_2d\": [439, 867, 910, 881], \"label\": \"Formula\"}, {\"bbox_2d\": [283, 928, 910, 975], \"label\": \"Formula\"}, {\"bbox_2d\": [502, 989, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [147, 451, 384, 463], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [60, 47, 244, 95], \"label\": \"Picture\"}, {\"bbox_2d\": [62, 114, 244, 263], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [58, 47, 248, 95], \"label\": \"Picture\"}, {\"bbox_2d\": [60, 112, 243, 201], \"label\": \"Text\"}, {\"bbox_2d\": [60, 211, 240, 264], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 47, 568, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 70, 610, 85], \"label\": \"Text\"}, {\"bbox_2d\": [121, 104, 966, 184], \"label\": \"Text\"}, {\"bbox_2d\": [121, 200, 965, 333], \"label\": \"Text\"}, {\"bbox_2d\": [121, 347, 964, 386], \"label\": \"Text\"}, {\"bbox_2d\": [121, 401, 964, 468], \"label\": \"Text\"}, {\"bbox_2d\": [121, 483, 957, 618], \"label\": \"Text\"}, {\"bbox_2d\": [121, 604, 961, 657], \"label\": \"Text\"}, {\"bbox_2d\": [121, 671, 966, 724], \"label\": \"Text\"}, {\"bbox_2d\": [121, 739, 948, 834], \"label\": \"Text\"}, {\"bbox_2d\": [121, 848, 971, 885], \"label\": \"Text\"}, {\"bbox_2d\": [750, 987, 975, 997], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [130, 45, 568, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 70, 610, 85], \"label\": \"Text\"}, {\"bbox_2d\": [121, 103, 976, 184], \"label\": \"Text\"}, {\"bbox_2d\": [121, 197, 980, 333], \"label\": \"Text\"}, {\"bbox_2d\": [121, 346, 975, 386], \"label\": \"Text\"}, {\"bbox_2d\": [121, 400, 973, 467], \"label\": \"Text\"}, {\"bbox_2d\": [121, 480, 966, 589], \"label\": \"Text\"}, {\"bbox_2d\": [121, 602, 973, 656], \"label\": \"Text\"}, {\"bbox_2d\": [121, 669, 978, 723], \"label\": \"Text\"}, {\"bbox_2d\": [121, 736, 955, 831], \"label\": \"Text\"}, {\"bbox_2d\": [121, 844, 983, 885], \"label\": \"Text\"}, {\"bbox_2d\": [749, 987, 976, 997], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 72, 506, 217], \"label\": \"Text\"}, {\"bbox_2d\": [536, 72, 951, 99], \"label\": \"Text\"}, {\"bbox_2d\": [91, 224, 506, 434], \"label\": \"Text\"}, {\"bbox_2d\": [536, 108, 951, 254], \"label\": \"Text\"}, {\"bbox_2d\": [536, 379, 951, 612], \"label\": \"Text\"}, {\"bbox_2d\": [536, 610, 951, 716], \"label\": \"Text\"}, {\"bbox_2d\": [536, 719, 951, 839], \"label\": \"Text\"}, {\"bbox_2d\": [91, 437, 506, 796], \"label\": \"Text\"}, {\"bbox_2d\": [536, 899, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [91, 809, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [937, 38, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [660, 311, 827, 322], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 71, 506, 219], \"label\": \"Text\"}, {\"bbox_2d\": [91, 224, 506, 431], \"label\": \"Text\"}, {\"bbox_2d\": [91, 436, 506, 794], \"label\": \"Text\"}, {\"bbox_2d\": [91, 799, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 844, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 719, 951, 837], \"label\": \"Text\"}, {\"bbox_2d\": [536, 609, 951, 712], \"label\": \"Text\"}, {\"bbox_2d\": [536, 349, 951, 602], \"label\": \"Text\"}, {\"bbox_2d\": [660, 312, 827, 323], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 106, 951, 254], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 99], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 275, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 259, 518, 273], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 292, 424, 303], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 112, 718, 123], \"label\": \"Text\"}, {\"bbox_2d\": [121, 185, 910, 210], \"label\": \"Text\"}, {\"bbox_2d\": [121, 227, 776, 238], \"label\": \"Text\"}, {\"bbox_2d\": [121, 308, 910, 361], \"label\": \"Text\"}, {\"bbox_2d\": [121, 379, 910, 419], \"label\": \"Text\"}, {\"bbox_2d\": [121, 436, 910, 447], \"label\": \"Text\"}, {\"bbox_2d\": [154, 127, 773, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 142, 846, 153], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 157, 634, 168], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 941, 537, 954], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 278, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 254, 522, 274], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 289, 428, 304], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 108, 721, 123], \"label\": \"Text\"}, {\"bbox_2d\": [121, 182, 914, 211], \"label\": \"Text\"}, {\"bbox_2d\": [121, 225, 776, 240], \"label\": \"Text\"}, {\"bbox_2d\": [121, 304, 913, 362], \"label\": \"Text\"}, {\"bbox_2d\": [121, 376, 913, 419], \"label\": \"Text\"}, {\"bbox_2d\": [121, 433, 912, 448], \"label\": \"Text\"}, {\"bbox_2d\": [152, 122, 775, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 137, 850, 153], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 152, 637, 168], \"label\": \"List-item\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [192, 106, 880, 130], \"label\": \"Section-header\"}, {\"bbox_2d\": [480, 153, 901, 166], \"label\": \"Text\"}, {\"bbox_2d\": [133, 194, 879, 268], \"label\": \"Text\"}, {\"bbox_2d\": [152, 295, 372, 308], \"label\": \"Text\"}, {\"bbox_2d\": [133, 316, 879, 411], \"label\": \"Text\"}, {\"bbox_2d\": [133, 416, 889, 469], \"label\": \"Text\"}, {\"bbox_2d\": [133, 519, 864, 589], \"label\": \"Text\"}, {\"bbox_2d\": [152, 497, 320, 510], \"label\": \"Text\"}, {\"bbox_2d\": [152, 599, 838, 631], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 639, 901, 691], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 700, 893, 914], \"label\": \"List-item\"}, {\"bbox_2d\": [513, 961, 522, 975], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [479, 151, 908, 167], \"label\": \"Text\"}, {\"bbox_2d\": [190, 104, 891, 128], \"label\": \"Title\"}, {\"bbox_2d\": [133, 191, 885, 268], \"label\": \"Text\"}, {\"bbox_2d\": [152, 292, 378, 308], \"label\": \"Text\"}, {\"bbox_2d\": [133, 312, 893, 409], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 413, 893, 470], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 514, 870, 591], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 595, 846, 631], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 635, 905, 692], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 696, 896, 914], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 494, 327, 510], \"label\": \"Text\"}, {\"bbox_2d\": [512, 958, 528, 974], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [225, 162, 809, 370], \"label\": \"Picture\"}, {\"bbox_2d\": [225, 390, 809, 550], \"label\": \"Caption\"}, {\"bbox_2d\": [508, 907, 525, 922], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [226, 391, 808, 549], \"label\": \"Caption\"}, {\"bbox_2d\": [508, 910, 525, 921], \"label\": \"Page-footer\"}, {\"bbox_2d\": [225, 165, 808, 372], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 911, 164], \"label\": \"Text\"}, {\"bbox_2d\": [121, 248, 714, 261], \"label\": \"Text\"}, {\"bbox_2d\": [121, 399, 864, 437], \"label\": \"Text\"}, {\"bbox_2d\": [121, 485, 918, 523], \"label\": \"Text\"}, {\"bbox_2d\": [121, 548, 584, 560], \"label\": \"Text\"}, {\"bbox_2d\": [164, 649, 207, 661], \"label\": \"Text\"}, {\"bbox_2d\": [121, 852, 911, 892], \"label\": \"Text\"}, {\"bbox_2d\": [121, 809, 486, 826], \"label\": \"List-item\"}, {\"bbox_2d\": [394, 291, 744, 325], \"label\": \"Formula\"}, {\"bbox_2d\": [259, 378, 780, 444], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 450, 279, 462], \"label\": \"List-item\"}, {\"bbox_2d\": [353, 585, 911, 624], \"label\": \"Formula\"}, {\"bbox_2d\": [168, 676, 828, 764], \"label\": \"Formula\"}, {\"bbox_2d\": [276, 745, 790, 802], \"label\": \"Formula\"}, {\"bbox_2d\": [508, 970, 527, 982], \"label\": \"Page-footer\"}, {\"bbox_2d\": [147, 274, 169, 286], \"label\": \"Text\"}, {\"bbox_2d\": [301, 189, 911, 227], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 98, 914, 161], \"label\": \"Text\"}, {\"bbox_2d\": [121, 248, 714, 260], \"label\": \"Text\"}, {\"bbox_2d\": [147, 275, 171, 286], \"label\": \"Text\"}, {\"bbox_2d\": [121, 342, 149, 354], \"label\": \"Text\"}, {\"bbox_2d\": [164, 650, 207, 661], \"label\": \"Text\"}, {\"bbox_2d\": [121, 811, 482, 825], \"label\": \"Text\"}, {\"bbox_2d\": [322, 184, 914, 224], \"label\": \"Formula\"}, {\"bbox_2d\": [292, 287, 743, 327], \"label\": \"Formula\"}, {\"bbox_2d\": [256, 355, 779, 395], \"label\": \"Formula\"}, {\"bbox_2d\": [349, 585, 914, 626], \"label\": \"Formula\"}, {\"bbox_2d\": [252, 681, 826, 783], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 450, 308, 462], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 486, 920, 524], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 549, 584, 561], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 407, 865, 433], \"label\": \"Text\"}, {\"bbox_2d\": [121, 852, 914, 890], \"label\": \"Text\"}, {\"bbox_2d\": [139, 903, 567, 914], \"label\": \"Footnote\"}, {\"bbox_2d\": [121, 916, 914, 946], \"label\": \"Footnote\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [266, 103, 893, 152], \"label\": \"Text\"}, {\"bbox_2d\": [231, 172, 926, 221], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 227, 920, 306], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 324, 927, 670], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 668, 727, 683], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 984, 698, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [266, 107, 893, 151], \"label\": \"Text\"}, {\"bbox_2d\": [231, 177, 926, 220], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 230, 919, 305], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 324, 927, 670], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 673, 726, 683], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [114, 18, 232, 125], \"label\": \"Picture\"}, {\"bbox_2d\": [134, 183, 232, 198], \"label\": \"Text\"}, {\"bbox_2d\": [134, 204, 901, 230], \"label\": \"List-item\"}, {\"bbox_2d\": [205, 248, 950, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [205, 291, 901, 306], \"label\": \"List-item\"}, {\"bbox_2d\": [134, 392, 956, 471], \"label\": \"Text\"}, {\"bbox_2d\": [134, 515, 924, 542], \"label\": \"Text\"}, {\"bbox_2d\": [134, 624, 235, 632], \"label\": \"Text\"}, {\"bbox_2d\": [134, 709, 247, 750], \"label\": \"Text\"}, {\"bbox_2d\": [134, 862, 460, 876], \"label\": \"Text\"}, {\"bbox_2d\": [524, 924, 571, 936], \"label\": \"Page-footer\"}, {\"bbox_2d\": [134, 369, 468, 385], \"label\": \"Section-header\"}, {\"bbox_2d\": [134, 492, 207, 507], \"label\": \"Section-header\"}, {\"bbox_2d\": [150, 565, 324, 615], \"label\": \"Picture\"}, {\"bbox_2d\": [146, 648, 290, 686], \"label\": \"Picture\"}, {\"bbox_2d\": [134, 709, 247, 750], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [112, 15, 232, 126], \"label\": \"Picture\"}, {\"bbox_2d\": [133, 156, 212, 177], \"label\": \"Section-header\"}, {\"bbox_2d\": [133, 366, 469, 386], \"label\": \"Section-header\"}, {\"bbox_2d\": [133, 488, 212, 509], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 860, 460, 878], \"label\": \"Page-footer\"}, {\"bbox_2d\": [524, 920, 576, 938], \"label\": \"Page-footer\"}, {\"bbox_2d\": [133, 705, 252, 751], \"label\": \"Caption\"}, {\"bbox_2d\": [133, 616, 238, 633], \"label\": \"Caption\"}, {\"bbox_2d\": [146, 643, 292, 685], \"label\": \"Picture\"}, {\"bbox_2d\": [150, 559, 325, 613], \"label\": \"Picture\"}, {\"bbox_2d\": [133, 512, 929, 545], \"label\": \"Text\"}, {\"bbox_2d\": [133, 391, 962, 472], \"label\": \"Text\"}, {\"bbox_2d\": [133, 180, 233, 197], \"label\": \"Text\"}, {\"bbox_2d\": [169, 200, 905, 231], \"label\": \"List-item\"}, {\"bbox_2d\": [205, 244, 955, 276], \"label\": \"List-item\"}, {\"bbox_2d\": [205, 289, 906, 307], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 320, 927, 352], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [46, 35, 322, 46], \"label\": \"Page-header\"}, {\"bbox_2d\": [94, 119, 533, 147], \"label\": \"Section-header\"}, {\"bbox_2d\": [180, 172, 837, 476], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 500, 269, 511], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 615, 252, 627], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 517, 231, 589], \"label\": \"Text\"}, {\"bbox_2d\": [265, 517, 393, 577], \"label\": \"Text\"}, {\"bbox_2d\": [438, 517, 588, 589], \"label\": \"Text\"}, {\"bbox_2d\": [609, 517, 739, 577], \"label\": \"Text\"}, {\"bbox_2d\": [781, 517, 942, 577], \"label\": \"Text\"}, {\"bbox_2d\": [781, 632, 911, 705], \"label\": \"Text\"}, {\"bbox_2d\": [609, 632, 752, 705], \"label\": \"Text\"}, {\"bbox_2d\": [438, 632, 530, 736], \"label\": \"Text\"}, {\"bbox_2d\": [265, 632, 395, 691], \"label\": \"Text\"}, {\"bbox_2d\": [91, 632, 241, 705], \"label\": \"Text\"}, {\"bbox_2d\": [284, 797, 924, 904], \"label\": \"Text\"}, {\"bbox_2d\": [284, 775, 386, 792], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 886, 198, 971], \"label\": \"Caption\"}, {\"bbox_2d\": [94, 756, 218, 874], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [47, 35, 322, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [94, 112, 534, 148], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 169, 836, 480], \"label\": \"Picture\"}, {\"bbox_2d\": [94, 500, 267, 510], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 616, 252, 626], \"label\": \"Section-header\"}, {\"bbox_2d\": [89, 755, 218, 875], \"label\": \"Picture\"}, {\"bbox_2d\": [283, 796, 920, 904], \"label\": \"Text\"}, {\"bbox_2d\": [283, 771, 387, 792], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 945, 189, 971], \"label\": \"Text\"}, {\"bbox_2d\": [781, 517, 940, 573], \"label\": \"Text\"}, {\"bbox_2d\": [609, 516, 740, 573], \"label\": \"Text\"}, {\"bbox_2d\": [437, 517, 589, 586], \"label\": \"Text\"}, {\"bbox_2d\": [266, 516, 393, 573], \"label\": \"Text\"}, {\"bbox_2d\": [86, 517, 232, 587], \"label\": \"Text\"}, {\"bbox_2d\": [94, 632, 242, 702], \"label\": \"Text\"}, {\"bbox_2d\": [266, 632, 394, 689], \"label\": \"Text\"}, {\"bbox_2d\": [437, 632, 530, 703], \"label\": \"Text\"}, {\"bbox_2d\": [437, 709, 522, 735], \"label\": \"Text\"}, {\"bbox_2d\": [609, 632, 753, 702], \"label\": \"Text\"}, {\"bbox_2d\": [94, 884, 197, 938], \"label\": \"Text\"}, {\"bbox_2d\": [781, 632, 869, 644], \"label\": \"Text\"}, {\"bbox_2d\": [781, 648, 909, 659], \"label\": \"Text\"}, {\"bbox_2d\": [781, 663, 907, 675], \"label\": \"Text\"}, {\"bbox_2d\": [781, 679, 906, 702], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 25, 352, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 97, 581, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 165, 203, 177], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 437, 251, 449], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 146, 690, 157], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 124, 672, 135], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 121, 506, 148], \"label\": \"Text\"}, {\"bbox_2d\": [91, 200, 506, 314], \"label\": \"Text\"}, {\"bbox_2d\": [91, 325, 506, 421], \"label\": \"Text\"}, {\"bbox_2d\": [91, 452, 506, 535], \"label\": \"Text\"}, {\"bbox_2d\": [91, 547, 506, 616], \"label\": \"Text\"}, {\"bbox_2d\": [91, 626, 506, 653], \"label\": \"Text\"}, {\"bbox_2d\": [528, 161, 944, 190], \"label\": \"Text\"}, {\"bbox_2d\": [528, 284, 944, 366], \"label\": \"Text\"}, {\"bbox_2d\": [528, 377, 944, 490], \"label\": \"Text\"}, {\"bbox_2d\": [528, 500, 944, 596], \"label\": \"Text\"}, {\"bbox_2d\": [528, 616, 851, 627], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 630, 944, 742], \"label\": \"Text\"}, {\"bbox_2d\": [528, 748, 944, 849], \"label\": \"Text\"}, {\"bbox_2d\": [528, 870, 944, 963], \"label\": \"Text\"}, {\"bbox_2d\": [91, 878, 506, 919], \"label\": \"Text\"}, {\"bbox_2d\": [91, 656, 506, 740], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 740, 506, 868], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 999, 446, 1008], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 351, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 98, 581, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 123, 672, 134], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 165, 201, 176], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 188, 228, 198], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 865, 694, 875], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 1000, 444, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [528, 146, 691, 156], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 122, 507, 147], \"label\": \"Text\"}, {\"bbox_2d\": [91, 202, 507, 313], \"label\": \"Text\"}, {\"bbox_2d\": [91, 325, 507, 422], \"label\": \"Text\"}, {\"bbox_2d\": [91, 439, 251, 449], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 453, 507, 535], \"label\": \"Text\"}, {\"bbox_2d\": [91, 548, 507, 615], \"label\": \"Text\"}, {\"bbox_2d\": [91, 627, 507, 652], \"label\": \"Text\"}, {\"bbox_2d\": [91, 880, 507, 919], \"label\": \"Text\"}, {\"bbox_2d\": [528, 880, 944, 962], \"label\": \"Text\"}, {\"bbox_2d\": [528, 629, 944, 739], \"label\": \"Text\"}, {\"bbox_2d\": [528, 500, 944, 597], \"label\": \"Text\"}, {\"bbox_2d\": [528, 377, 944, 488], \"label\": \"Text\"}, {\"bbox_2d\": [528, 160, 944, 185], \"label\": \"Text\"}, {\"bbox_2d\": [91, 656, 507, 738], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 742, 507, 868], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 189, 606, 199], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 203, 812, 214], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 218, 944, 242], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 247, 944, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 283, 944, 365], \"label\": \"Text\"}, {\"bbox_2d\": [528, 752, 944, 848], \"label\": \"Text\"}, {\"bbox_2d\": [528, 614, 851, 624], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [23, 12, 226, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [23, 28, 873, 58], \"label\": \"Text\"}, {\"bbox_2d\": [23, 82, 1009, 107], \"label\": \"Text\"}, {\"bbox_2d\": [23, 118, 1009, 139], \"label\": \"Text\"}, {\"bbox_2d\": [23, 146, 1009, 171], \"label\": \"Text\"}, {\"bbox_2d\": [23, 207, 1009, 233], \"label\": \"Text\"}, {\"bbox_2d\": [23, 245, 1009, 271], \"label\": \"Text\"}, {\"bbox_2d\": [23, 309, 1009, 335], \"label\": \"Text\"}, {\"bbox_2d\": [23, 299, 1009, 309], \"label\": \"Text\"}, {\"bbox_2d\": [23, 312, 1009, 335], \"label\": \"Text\"}, {\"bbox_2d\": [23, 339, 1009, 350], \"label\": \"Text\"}, {\"bbox_2d\": [23, 353, 1009, 363], \"label\": \"Text\"}, {\"bbox_2d\": [23, 367, 1009, 389], \"label\": \"Text\"}, {\"bbox_2d\": [23, 392, 1009, 417], \"label\": \"Text\"}, {\"bbox_2d\": [23, 421, 1009, 446], \"label\": \"Text\"}, {\"bbox_2d\": [23, 171, 1009, 181], \"label\": \"Section-header\"}, {\"bbox_2d\": [23, 277, 1009, 287], \"label\": \"Section-header\"}, {\"bbox_2d\": [23, 185, 1009, 195], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 349, 522, 360], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [22, 12, 225, 25], \"label\": \"Page-header\"}, {\"bbox_2d\": [22, 38, 871, 45], \"label\": \"Text\"}, {\"bbox_2d\": [22, 54, 1012, 76], \"label\": \"Text\"}, {\"bbox_2d\": [22, 85, 1012, 114], \"label\": \"Text\"}, {\"bbox_2d\": [22, 123, 1012, 138], \"label\": \"Text\"}, {\"bbox_2d\": [22, 147, 1012, 161], \"label\": \"Text\"}, {\"bbox_2d\": [22, 178, 1012, 207], \"label\": \"Text\"}, {\"bbox_2d\": [22, 216, 1012, 238], \"label\": \"Text\"}, {\"bbox_2d\": [22, 254, 1012, 269], \"label\": \"Text\"}, {\"bbox_2d\": [22, 285, 1012, 300], \"label\": \"Text\"}, {\"bbox_2d\": [22, 309, 1012, 330], \"label\": \"Text\"}, {\"bbox_2d\": [22, 170, 131, 177], \"label\": \"Section-header\"}, {\"bbox_2d\": [22, 247, 121, 254], \"label\": \"Section-header\"}, {\"bbox_2d\": [22, 278, 154, 285], \"label\": \"Section-header\"}, {\"bbox_2d\": [511, 350, 522, 357], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [68, 47, 601, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [68, 69, 690, 85], \"label\": \"Text\"}, {\"bbox_2d\": [61, 104, 450, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 129, 295, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 320, 396, 334], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 432, 302, 445], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 598, 264, 612], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 767, 173, 782], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 840, 138, 854], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 888, 278, 902], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 618, 884, 672], \"label\": \"Text\"}, {\"bbox_2d\": [94, 678, 532, 757], \"label\": \"Table\"}, {\"bbox_2d\": [94, 788, 905, 829], \"label\": \"Text\"}, {\"bbox_2d\": [94, 860, 516, 874], \"label\": \"Text\"}, {\"bbox_2d\": [94, 906, 863, 934], \"label\": \"Text\"}, {\"bbox_2d\": [61, 987, 281, 998], \"label\": \"Page-footer\"}, {\"bbox_2d\": [94, 338, 900, 419], \"label\": \"Text\"}, {\"bbox_2d\": [94, 214, 893, 254], \"label\": \"Text\"}, {\"bbox_2d\": [94, 268, 904, 308], \"label\": \"Text\"}, {\"bbox_2d\": [94, 147, 904, 201], \"label\": \"Text\"}, {\"bbox_2d\": [94, 450, 903, 584], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [69, 45, 690, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [61, 102, 455, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 129, 295, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 321, 399, 334], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 146, 912, 200], \"label\": \"Text\"}, {\"bbox_2d\": [94, 213, 901, 254], \"label\": \"Text\"}, {\"bbox_2d\": [94, 267, 913, 307], \"label\": \"Text\"}, {\"bbox_2d\": [94, 338, 902, 419], \"label\": \"Text\"}, {\"bbox_2d\": [94, 449, 918, 585], \"label\": \"Text\"}, {\"bbox_2d\": [94, 617, 896, 671], \"label\": \"Text\"}, {\"bbox_2d\": [94, 598, 265, 611], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 432, 302, 446], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 768, 174, 782], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 841, 139, 855], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 887, 278, 900], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 788, 916, 828], \"label\": \"Text\"}, {\"bbox_2d\": [94, 860, 518, 874], \"label\": \"Text\"}, {\"bbox_2d\": [94, 906, 871, 933], \"label\": \"Text\"}, {\"bbox_2d\": [60, 987, 282, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [99, 676, 530, 757], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [783, 73, 888, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [507, 93, 516, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 308, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 142, 496, 191], \"label\": \"Text\"}, {\"bbox_2d\": [130, 196, 496, 255], \"label\": \"Text\"}, {\"bbox_2d\": [130, 260, 496, 337], \"label\": \"Text\"}, {\"bbox_2d\": [130, 342, 496, 362], \"label\": \"Text\"}, {\"bbox_2d\": [130, 368, 496, 403], \"label\": \"Text\"}, {\"bbox_2d\": [130, 408, 496, 443], \"label\": \"Text\"}, {\"bbox_2d\": [130, 448, 496, 496], \"label\": \"Text\"}, {\"bbox_2d\": [130, 501, 496, 520], \"label\": \"Text\"}, {\"bbox_2d\": [130, 525, 496, 588], \"label\": \"Text\"}, {\"bbox_2d\": [130, 624, 496, 750], \"label\": \"Text\"}, {\"bbox_2d\": [130, 787, 496, 823], \"label\": \"Text\"}, {\"bbox_2d\": [130, 828, 496, 864], \"label\": \"Text\"}, {\"bbox_2d\": [130, 869, 496, 898], \"label\": \"Text\"}, {\"bbox_2d\": [130, 903, 496, 940], \"label\": \"Text\"}, {\"bbox_2d\": [130, 945, 496, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 829, 890, 982], \"label\": \"Text\"}, {\"bbox_2d\": [184, 766, 443, 777], \"label\": \"Section-header\"}, {\"bbox_2d\": [206, 602, 421, 613], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 664, 700, 688], \"label\": \"Text\"}, {\"bbox_2d\": [526, 693, 890, 716], \"label\": \"Text\"}, {\"bbox_2d\": [526, 371, 890, 448], \"label\": \"Text\"}, {\"bbox_2d\": [526, 453, 890, 663], \"label\": \"Text\"}, {\"bbox_2d\": [526, 669, 890, 688], \"label\": \"Text\"}, {\"bbox_2d\": [556, 753, 739, 808], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 142, 890, 179], \"label\": \"Text\"}, {\"bbox_2d\": [526, 184, 890, 219], \"label\": \"Text\"}, {\"bbox_2d\": [526, 225, 890, 255], \"label\": \"Text\"}, {\"bbox_2d\": [526, 260, 890, 289], \"label\": \"Text\"}, {\"bbox_2d\": [526, 294, 890, 337], \"label\": \"Text\"}, {\"bbox_2d\": [590, 353, 829, 362], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [779, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [506, 93, 520, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 142, 500, 191], \"label\": \"Text\"}, {\"bbox_2d\": [130, 194, 495, 217], \"label\": \"Text\"}, {\"bbox_2d\": [130, 220, 500, 243], \"label\": \"Text\"}, {\"bbox_2d\": [132, 246, 469, 256], \"label\": \"Text\"}, {\"bbox_2d\": [131, 260, 500, 283], \"label\": \"Text\"}, {\"bbox_2d\": [130, 287, 500, 336], \"label\": \"Text\"}, {\"bbox_2d\": [131, 339, 500, 363], \"label\": \"Text\"}, {\"bbox_2d\": [131, 366, 501, 402], \"label\": \"Text\"}, {\"bbox_2d\": [130, 406, 503, 442], \"label\": \"Text\"}, {\"bbox_2d\": [130, 446, 503, 495], \"label\": \"Text\"}, {\"bbox_2d\": [131, 499, 500, 522], \"label\": \"Text\"}, {\"bbox_2d\": [130, 526, 500, 588], \"label\": \"Text\"}, {\"bbox_2d\": [130, 623, 501, 751], \"label\": \"Text\"}, {\"bbox_2d\": [131, 786, 500, 823], \"label\": \"Text\"}, {\"bbox_2d\": [131, 826, 500, 862], \"label\": \"Text\"}, {\"bbox_2d\": [131, 866, 500, 902], \"label\": \"Text\"}, {\"bbox_2d\": [131, 906, 500, 942], \"label\": \"Text\"}, {\"bbox_2d\": [131, 946, 500, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 826, 895, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 691, 894, 714], \"label\": \"Text\"}, {\"bbox_2d\": [526, 636, 894, 660], \"label\": \"Text\"}, {\"bbox_2d\": [542, 664, 890, 687], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 609, 891, 632], \"label\": \"Text\"}, {\"bbox_2d\": [526, 452, 895, 605], \"label\": \"Text\"}, {\"bbox_2d\": [526, 373, 895, 448], \"label\": \"Text\"}, {\"bbox_2d\": [526, 301, 895, 337], \"label\": \"Text\"}, {\"bbox_2d\": [526, 261, 895, 297], \"label\": \"Text\"}, {\"bbox_2d\": [526, 221, 895, 257], \"label\": \"Text\"}, {\"bbox_2d\": [526, 181, 895, 218], \"label\": \"Text\"}, {\"bbox_2d\": [526, 142, 895, 178], \"label\": \"Text\"}, {\"bbox_2d\": [589, 353, 832, 363], \"label\": \"Section-header\"}, {\"bbox_2d\": [205, 603, 425, 613], \"label\": \"Section-header\"}, {\"bbox_2d\": [185, 766, 445, 776], \"label\": \"Section-header\"}, {\"bbox_2d\": [555, 743, 896, 804], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 48, 752, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 105, 511, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 293, 677, 309], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 130, 971, 170], \"label\": \"Text\"}, {\"bbox_2d\": [155, 184, 966, 209], \"label\": \"Text\"}, {\"bbox_2d\": [155, 224, 965, 249], \"label\": \"Text\"}, {\"bbox_2d\": [155, 264, 820, 276], \"label\": \"Text\"}, {\"bbox_2d\": [155, 319, 962, 372], \"label\": \"Text\"}, {\"bbox_2d\": [155, 386, 970, 426], \"label\": \"Text\"}, {\"bbox_2d\": [155, 465, 933, 490], \"label\": \"Text\"}, {\"bbox_2d\": [155, 441, 354, 453], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 500, 279, 512], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 520, 974, 573], \"label\": \"Text\"}, {\"bbox_2d\": [155, 586, 961, 639], \"label\": \"Text\"}, {\"bbox_2d\": [155, 733, 962, 800], \"label\": \"Text\"}, {\"bbox_2d\": [155, 832, 969, 872], \"label\": \"Text\"}, {\"bbox_2d\": [155, 901, 965, 941], \"label\": \"Text\"}, {\"bbox_2d\": [155, 886, 282, 898], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 814, 194, 826], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 717, 284, 729], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 653, 549, 665], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 666, 967, 691], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 693, 583, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [749, 987, 975, 997], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 102, 516, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 291, 677, 309], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 715, 284, 728], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 813, 194, 826], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 884, 281, 897], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 129, 982, 169], \"label\": \"Text\"}, {\"bbox_2d\": [155, 182, 976, 210], \"label\": \"Text\"}, {\"bbox_2d\": [155, 223, 975, 250], \"label\": \"Text\"}, {\"bbox_2d\": [155, 263, 825, 276], \"label\": \"Text\"}, {\"bbox_2d\": [155, 317, 970, 372], \"label\": \"Text\"}, {\"bbox_2d\": [155, 385, 980, 425], \"label\": \"Text\"}, {\"bbox_2d\": [155, 439, 355, 452], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 462, 942, 490], \"label\": \"Text\"}, {\"bbox_2d\": [155, 499, 279, 513], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 517, 984, 571], \"label\": \"Text\"}, {\"bbox_2d\": [155, 584, 972, 638], \"label\": \"Text\"}, {\"bbox_2d\": [155, 651, 552, 665], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 665, 976, 692], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 692, 587, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 732, 972, 800], \"label\": \"Text\"}, {\"bbox_2d\": [155, 830, 980, 871], \"label\": \"Text\"}, {\"bbox_2d\": [155, 901, 974, 941], \"label\": \"Text\"}, {\"bbox_2d\": [749, 987, 976, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [130, 45, 751, 85], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [99, 51, 382, 98], \"label\": \"Text\"}, {\"bbox_2d\": [132, 110, 887, 947], \"label\": \"Picture\"}, {\"bbox_2d\": [382, 957, 654, 972], \"label\": \"Caption\"}, {\"bbox_2d\": [502, 983, 532, 994], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [97, 51, 390, 98], \"label\": \"Page-header\"}, {\"bbox_2d\": [124, 104, 924, 950], \"label\": \"Picture\"}, {\"bbox_2d\": [381, 958, 664, 972], \"label\": \"Caption\"}, {\"bbox_2d\": [503, 981, 540, 997], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [437, 68, 598, 82], \"label\": \"Page-header\"}, {\"bbox_2d\": [273, 137, 761, 155], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 268, 223, 285], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 934, 522, 949], \"label\": \"Page-footer\"}, {\"bbox_2d\": [88, 399, 951, 898], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [436, 60, 602, 85], \"label\": \"Text\"}, {\"bbox_2d\": [88, 256, 224, 290], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 393, 953, 896], \"label\": \"Table\"}, {\"bbox_2d\": [514, 926, 524, 951], \"label\": \"Page-footer\"}, {\"bbox_2d\": [272, 122, 767, 161], \"label\": \"Title\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 47, 298, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 126, 905, 159], \"label\": \"Text\"}, {\"bbox_2d\": [146, 160, 905, 441], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 442, 914, 475], \"label\": \"Text\"}, {\"bbox_2d\": [121, 476, 914, 541], \"label\": \"Text\"}, {\"bbox_2d\": [121, 542, 905, 590], \"label\": \"Text\"}, {\"bbox_2d\": [121, 591, 900, 640], \"label\": \"Text\"}, {\"bbox_2d\": [121, 641, 851, 669], \"label\": \"Text\"}, {\"bbox_2d\": [146, 670, 676, 691], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 691, 914, 724], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 725, 914, 757], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 758, 914, 822], \"label\": \"Text\"}, {\"bbox_2d\": [121, 823, 603, 839], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 840, 362, 871], \"label\": \"Text\"}, {\"bbox_2d\": [146, 872, 430, 888], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 889, 465, 905], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 906, 906, 938], \"label\": \"Text\"}, {\"bbox_2d\": [121, 974, 440, 988], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"Text\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"Text\"}, {\"bbox_2d\": [121, 126, 929, 159], \"label\": \"Text\"}, {\"bbox_2d\": [121, 441, 937, 474], \"label\": \"Text\"}, {\"bbox_2d\": [121, 474, 931, 540], \"label\": \"Text\"}, {\"bbox_2d\": [121, 540, 915, 589], \"label\": \"Text\"}, {\"bbox_2d\": [121, 590, 930, 639], \"label\": \"Text\"}, {\"bbox_2d\": [121, 639, 882, 672], \"label\": \"Text\"}, {\"bbox_2d\": [121, 755, 930, 821], \"label\": \"Text\"}, {\"bbox_2d\": [121, 838, 926, 871], \"label\": \"Text\"}, {\"bbox_2d\": [121, 904, 932, 937], \"label\": \"Text\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}, {\"bbox_2d\": [145, 672, 696, 689], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 689, 954, 722], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 722, 935, 755], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 871, 440, 887], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 888, 477, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 159, 946, 192], \"label\": \"List-item\"}, {\"bbox_2d\": [178, 193, 576, 209], \"label\": \"List-item\"}, {\"bbox_2d\": [174, 209, 515, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [170, 226, 543, 242], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 242, 558, 258], \"label\": \"List-item\"}, {\"bbox_2d\": [173, 259, 362, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 275, 565, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [164, 292, 386, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 307, 377, 325], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 325, 381, 341], \"label\": \"List-item\"}, {\"bbox_2d\": [173, 342, 536, 358], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 358, 432, 374], \"label\": \"List-item\"}, {\"bbox_2d\": [164, 375, 538, 391], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 391, 682, 407], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 408, 914, 440], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 821, 613, 837], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 908, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 987, 688, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [255, 563, 502, 575], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 142, 927, 558], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 906, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [254, 138, 925, 561], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 564, 501, 574], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 685, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 99, 594, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 132, 410, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 278, 590, 288], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 501, 323, 512], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 695, 264, 706], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 423, 913, 481], \"label\": \"Text\"}, {\"bbox_2d\": [121, 719, 913, 842], \"label\": \"Text\"}, {\"bbox_2d\": [146, 146, 912, 246], \"label\": \"Table\"}, {\"bbox_2d\": [121, 308, 912, 402], \"label\": \"Table\"}, {\"bbox_2d\": [146, 531, 912, 678], \"label\": \"Table\"}, {\"bbox_2d\": [498, 941, 537, 954], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 599, 113], \"label\": \"Page-header\"}, {\"bbox_2d\": [150, 144, 915, 242], \"label\": \"Table\"}, {\"bbox_2d\": [121, 129, 414, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 273, 590, 289], \"label\": \"Section-header\"}, {\"bbox_2d\": [146, 304, 916, 406], \"label\": \"Table\"}, {\"bbox_2d\": [121, 417, 922, 482], \"label\": \"Text\"}, {\"bbox_2d\": [121, 497, 324, 514], \"label\": \"Section-header\"}, {\"bbox_2d\": [144, 529, 915, 677], \"label\": \"Table\"}, {\"bbox_2d\": [121, 691, 268, 708], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 715, 919, 843], \"label\": \"Text\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [478, 105, 548, 119], \"label\": \"Caption\"}, {\"bbox_2d\": [242, 119, 787, 356], \"label\": \"Table\"}, {\"bbox_2d\": [121, 382, 374, 397], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 457, 205, 473], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 515, 805, 530], \"label\": \"Text\"}, {\"bbox_2d\": [121, 687, 863, 757], \"label\": \"Text\"}, {\"bbox_2d\": [121, 615, 205, 631], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 842, 205, 857], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 912, 790, 956], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [496, 986, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 911, 790, 957], \"label\": \"Text\"}, {\"bbox_2d\": [121, 685, 862, 758], \"label\": \"Text\"}, {\"bbox_2d\": [121, 513, 805, 532], \"label\": \"Text\"}, {\"bbox_2d\": [247, 120, 788, 355], \"label\": \"Table\"}, {\"bbox_2d\": [478, 104, 548, 120], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 384, 373, 397], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 459, 204, 472], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 617, 204, 630], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 843, 204, 856], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [75, 24, 263, 41], \"label\": \"Text\"}, {\"bbox_2d\": [531, 0, 951, 13], \"label\": \"Page-header\"}, {\"bbox_2d\": [41, 70, 994, 196], \"label\": \"Text\"}, {\"bbox_2d\": [41, 224, 994, 296], \"label\": \"Text\"}, {\"bbox_2d\": [41, 324, 994, 368], \"label\": \"Text\"}, {\"bbox_2d\": [41, 396, 994, 605], \"label\": \"Text\"}, {\"bbox_2d\": [462, 633, 573, 650], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 679, 652, 695], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 723, 994, 796], \"label\": \"Text\"}, {\"bbox_2d\": [41, 824, 994, 868], \"label\": \"Text\"}, {\"bbox_2d\": [78, 897, 345, 915], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 942, 178, 960], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 988, 250, 1006], \"label\": \"List-item\"}, {\"bbox_2d\": [928, 1020, 951, 1036], \"label\": \"Page-footer\"}, {\"bbox_2d\": [2, 1020, 58, 1036], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [0, 0, 510, 13], \"label\": \"Page-header\"}, {\"bbox_2d\": [530, 0, 1035, 13], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 23, 264, 41], \"label\": \"Text\"}, {\"bbox_2d\": [42, 69, 993, 195], \"label\": \"Text\"}, {\"bbox_2d\": [42, 223, 993, 295], \"label\": \"Text\"}, {\"bbox_2d\": [42, 323, 993, 368], \"label\": \"Text\"}, {\"bbox_2d\": [42, 396, 993, 603], \"label\": \"Text\"}, {\"bbox_2d\": [42, 723, 993, 795], \"label\": \"Text\"}, {\"bbox_2d\": [42, 823, 993, 868], \"label\": \"Text\"}, {\"bbox_2d\": [462, 631, 573, 649], \"label\": \"Section-header\"}, {\"bbox_2d\": [382, 677, 653, 695], \"label\": \"Section-header\"}, {\"bbox_2d\": [0, 1022, 56, 1036], \"label\": \"Page-footer\"}, {\"bbox_2d\": [929, 1022, 1036, 1036], \"label\": \"Page-footer\"}, {\"bbox_2d\": [78, 896, 344, 913], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 942, 177, 959], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 987, 250, 1005], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [199, 109, 886, 174], \"label\": \"Text\"}, {\"bbox_2d\": [151, 211, 886, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 312, 886, 375], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 414, 886, 606], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 644, 886, 707], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 745, 886, 834], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [198, 108, 887, 173], \"label\": \"Text\"}, {\"bbox_2d\": [148, 210, 887, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [148, 312, 886, 376], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 414, 889, 605], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 642, 885, 707], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 744, 887, 834], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 97, 914, 122], \"label\": \"Text\"}, {\"bbox_2d\": [121, 139, 549, 149], \"label\": \"Text\"}, {\"bbox_2d\": [119, 151, 671, 680], \"label\": \"Table\"}, {\"bbox_2d\": [497, 941, 539, 954], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 913, 123], \"label\": \"Text\"}, {\"bbox_2d\": [121, 152, 672, 681], \"label\": \"Table\"}, {\"bbox_2d\": [121, 137, 552, 152], \"label\": \"Caption\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 61, 1009, 108], \"label\": \"Text\"}, {\"bbox_2d\": [25, 113, 1009, 175], \"label\": \"Text\"}, {\"bbox_2d\": [25, 175, 1009, 234], \"label\": \"Text\"}, {\"bbox_2d\": [43, 238, 172, 250], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 256, 1009, 329], \"label\": \"Text\"}, {\"bbox_2d\": [25, 331, 1009, 355], \"label\": \"Text\"}, {\"bbox_2d\": [25, 387, 1009, 443], \"label\": \"Text\"}, {\"bbox_2d\": [25, 448, 1009, 495], \"label\": \"Text\"}, {\"bbox_2d\": [25, 520, 1009, 593], \"label\": \"Text\"}, {\"bbox_2d\": [25, 597, 1009, 634], \"label\": \"Text\"}, {\"bbox_2d\": [25, 367, 160, 377], \"label\": \"Section-header\"}, {\"bbox_2d\": [38, 506, 164, 517], \"label\": \"Section-header\"}, {\"bbox_2d\": [397, 664, 414, 674], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 60, 1009, 105], \"label\": \"Text\"}, {\"bbox_2d\": [25, 113, 1009, 169], \"label\": \"Text\"}, {\"bbox_2d\": [25, 176, 1009, 232], \"label\": \"Text\"}, {\"bbox_2d\": [25, 257, 1009, 325], \"label\": \"Text\"}, {\"bbox_2d\": [25, 332, 1009, 354], \"label\": \"Text\"}, {\"bbox_2d\": [25, 385, 1009, 441], \"label\": \"Text\"}, {\"bbox_2d\": [25, 448, 1009, 493], \"label\": \"Text\"}, {\"bbox_2d\": [25, 524, 1009, 592], \"label\": \"Text\"}, {\"bbox_2d\": [25, 599, 1009, 632], \"label\": \"Text\"}, {\"bbox_2d\": [508, 664, 525, 675], \"label\": \"Page-footer\"}, {\"bbox_2d\": [40, 506, 164, 516], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 239, 172, 250], \"label\": \"Section-header\"}, {\"bbox_2d\": [40, 367, 159, 377], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [323, 38, 715, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [911, 38, 923, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [89, 512, 482, 559], \"label\": \"Text\"}, {\"bbox_2d\": [89, 560, 506, 667], \"label\": \"Text\"}, {\"bbox_2d\": [89, 668, 506, 782], \"label\": \"Text\"}, {\"bbox_2d\": [89, 783, 482, 861], \"label\": \"Text\"}, {\"bbox_2d\": [89, 862, 506, 946], \"label\": \"Text\"}, {\"bbox_2d\": [89, 974, 216, 986], \"label\": \"Section-header\"}, {\"bbox_2d\": [482, 902, 905, 929], \"label\": \"Caption\"}, {\"bbox_2d\": [511, 454, 903, 481], \"label\": \"Caption\"}, {\"bbox_2d\": [89, 454, 482, 479], \"label\": \"Caption\"}, {\"bbox_2d\": [511, 71, 890, 422], \"label\": \"Picture\"}, {\"bbox_2d\": [510, 522, 890, 871], \"label\": \"Picture\"}, {\"bbox_2d\": [90, 71, 470, 423], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [939, 39, 948, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [321, 39, 715, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [524, 70, 892, 435], \"label\": \"Picture\"}, {\"bbox_2d\": [99, 71, 466, 433], \"label\": \"Picture\"}, {\"bbox_2d\": [511, 456, 903, 480], \"label\": \"Caption\"}, {\"bbox_2d\": [88, 456, 480, 480], \"label\": \"Caption\"}, {\"bbox_2d\": [88, 514, 480, 555], \"label\": \"Text\"}, {\"bbox_2d\": [88, 559, 480, 661], \"label\": \"Text\"}, {\"bbox_2d\": [88, 664, 480, 781], \"label\": \"Text\"}, {\"bbox_2d\": [88, 785, 480, 856], \"label\": \"Text\"}, {\"bbox_2d\": [88, 860, 480, 947], \"label\": \"Text\"}, {\"bbox_2d\": [88, 974, 216, 986], \"label\": \"Page-footer\"}, {\"bbox_2d\": [511, 904, 903, 928], \"label\": \"Caption\"}, {\"bbox_2d\": [523, 519, 889, 881], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 83, 892, 161], \"label\": \"Text\"}, {\"bbox_2d\": [274, 200, 910, 263], \"label\": \"Text\"}, {\"bbox_2d\": [274, 279, 908, 389], \"label\": \"Text\"}, {\"bbox_2d\": [274, 404, 907, 451], \"label\": \"Text\"}, {\"bbox_2d\": [274, 467, 910, 545], \"label\": \"Text\"}, {\"bbox_2d\": [274, 561, 904, 592], \"label\": \"Text\"}, {\"bbox_2d\": [211, 594, 630, 611], \"label\": \"List-item\"}, {\"bbox_2d\": [323, 614, 622, 631], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 646, 856, 677], \"label\": \"Text\"}, {\"bbox_2d\": [274, 693, 897, 757], \"label\": \"Text\"}, {\"bbox_2d\": [274, 796, 890, 827], \"label\": \"Text\"}, {\"bbox_2d\": [274, 866, 876, 929], \"label\": \"Text\"}, {\"bbox_2d\": [199, 841, 503, 863], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 770, 375, 792], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 979, 380, 992], \"label\": \"Page-footer\"}, {\"bbox_2d\": [197, 176, 262, 196], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 891, 161], \"label\": \"Text\"}, {\"bbox_2d\": [273, 202, 909, 263], \"label\": \"Text\"}, {\"bbox_2d\": [273, 281, 905, 388], \"label\": \"Text\"}, {\"bbox_2d\": [273, 406, 904, 451], \"label\": \"Text\"}, {\"bbox_2d\": [273, 469, 907, 544], \"label\": \"Text\"}, {\"bbox_2d\": [273, 649, 853, 678], \"label\": \"Text\"}, {\"bbox_2d\": [273, 696, 892, 756], \"label\": \"Text\"}, {\"bbox_2d\": [273, 798, 887, 826], \"label\": \"Text\"}, {\"bbox_2d\": [273, 868, 872, 928], \"label\": \"Text\"}, {\"bbox_2d\": [197, 845, 502, 861], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 774, 375, 791], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 179, 261, 196], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 563, 897, 591], \"label\": \"List-item\"}, {\"bbox_2d\": [323, 596, 630, 611], \"label\": \"List-item\"}, {\"bbox_2d\": [323, 615, 621, 631], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 83, 660, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [196, 517, 451, 539], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 541, 906, 557], \"label\": \"Text\"}, {\"bbox_2d\": [274, 560, 788, 576], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 580, 904, 612], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 486, 895, 503], \"label\": \"Text\"}, {\"bbox_2d\": [272, 619, 909, 916], \"label\": \"Picture\"}, {\"bbox_2d\": [273, 97, 914, 461], \"label\": \"Table\"}, {\"bbox_2d\": [101, 975, 378, 990], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [272, 86, 660, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [273, 544, 905, 558], \"label\": \"Text\"}, {\"bbox_2d\": [273, 489, 895, 503], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [197, 521, 451, 538], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 97, 912, 460], \"label\": \"Table\"}, {\"bbox_2d\": [297, 623, 824, 894], \"label\": \"Text\"}, {\"bbox_2d\": [273, 562, 788, 577], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 581, 903, 612], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 108, 914, 248], \"label\": \"Text\"}, {\"bbox_2d\": [121, 265, 914, 311], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 328, 914, 405], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 422, 778, 437], \"label\": \"Text\"}, {\"bbox_2d\": [121, 485, 914, 562], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 579, 914, 626], \"label\": \"Text\"}, {\"bbox_2d\": [121, 643, 914, 750], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 768, 914, 814], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 832, 900, 877], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 914, 891, 952], \"label\": \"Text\"}, {\"bbox_2d\": [388, 454, 647, 469], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [121, 107, 914, 248], \"label\": \"Text\"}, {\"bbox_2d\": [121, 264, 914, 311], \"label\": \"Text\"}, {\"bbox_2d\": [121, 327, 914, 405], \"label\": \"Text\"}, {\"bbox_2d\": [161, 421, 791, 437], \"label\": \"Text\"}, {\"bbox_2d\": [121, 484, 914, 562], \"label\": \"Text\"}, {\"bbox_2d\": [121, 578, 913, 625], \"label\": \"Text\"}, {\"bbox_2d\": [121, 641, 914, 751], \"label\": \"Text\"}, {\"bbox_2d\": [121, 767, 913, 814], \"label\": \"Text\"}, {\"bbox_2d\": [121, 829, 913, 876], \"label\": \"Text\"}, {\"bbox_2d\": [121, 910, 897, 954], \"label\": \"Page-footer\"}, {\"bbox_2d\": [388, 452, 647, 468], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [422, 21, 891, 50], \"label\": \"Picture\"}, {\"bbox_2d\": [131, 71, 545, 110], \"label\": \"Title\"}, {\"bbox_2d\": [564, 100, 893, 116], \"label\": \"Title\"}, {\"bbox_2d\": [564, 116, 894, 132], \"label\": \"Title\"}, {\"bbox_2d\": [131, 159, 485, 220], \"label\": \"Title\"}, {\"bbox_2d\": [131, 235, 503, 308], \"label\": \"Title\"}, {\"bbox_2d\": [131, 323, 473, 347], \"label\": \"Title\"}, {\"bbox_2d\": [131, 362, 345, 372], \"label\": \"Title\"}, {\"bbox_2d\": [131, 387, 360, 397], \"label\": \"Title\"}, {\"bbox_2d\": [526, 159, 842, 167], \"label\": \"Title\"}, {\"bbox_2d\": [555, 182, 894, 207], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 241, 772, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 289, 895, 297], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 312, 752, 321], \"label\": \"Title\"}, {\"bbox_2d\": [526, 337, 894, 397], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [593, 52, 736, 63], \"label\": \"Text\"}, {\"bbox_2d\": [423, 20, 892, 51], \"label\": \"Picture\"}, {\"bbox_2d\": [562, 97, 898, 110], \"label\": \"Title\"}, {\"bbox_2d\": [562, 114, 897, 130], \"label\": \"Title\"}, {\"bbox_2d\": [131, 69, 333, 88], \"label\": \"Title\"}, {\"bbox_2d\": [131, 93, 546, 112], \"label\": \"Title\"}, {\"bbox_2d\": [163, 116, 296, 129], \"label\": \"Title\"}, {\"bbox_2d\": [131, 157, 484, 220], \"label\": \"Title\"}, {\"bbox_2d\": [131, 234, 494, 310], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 325, 471, 348], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 362, 345, 372], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 387, 361, 397], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 157, 842, 167], \"label\": \"Section-header\"}, {\"bbox_2d\": [626, 221, 794, 231], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 313, 753, 323], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 240, 616, 250], \"label\": \"List-item\"}, {\"bbox_2d\": [566, 253, 773, 277], \"label\": \"Table\"}, {\"bbox_2d\": [526, 285, 894, 295], \"label\": \"List-item\"}, {\"bbox_2d\": [549, 181, 895, 207], \"label\": \"Table\"}, {\"bbox_2d\": [526, 333, 898, 396], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [742, 51, 942, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [160, 95, 812, 954], \"label\": \"Picture\"}, {\"bbox_2d\": [353, 952, 685, 971], \"label\": \"Caption\"}, {\"bbox_2d\": [508, 988, 533, 1001], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [354, 955, 702, 971], \"label\": \"Caption\"}, {\"bbox_2d\": [509, 985, 542, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [118, 42, 979, 947], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [821, 1004, 953, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [533, 789, 942, 895], \"label\": \"Text\"}, {\"bbox_2d\": [81, 621, 503, 885], \"label\": \"Text\"}, {\"bbox_2d\": [533, 529, 951, 679], \"label\": \"Text\"}, {\"bbox_2d\": [533, 378, 951, 515], \"label\": \"Text\"}, {\"bbox_2d\": [533, 182, 951, 367], \"label\": \"Text\"}, {\"bbox_2d\": [81, 214, 495, 288], \"label\": \"Text\"}, {\"bbox_2d\": [81, 299, 495, 610], \"label\": \"Text\"}, {\"bbox_2d\": [533, 112, 953, 169], \"label\": \"Text\"}, {\"bbox_2d\": [81, 112, 496, 202], \"label\": \"Text\"}, {\"bbox_2d\": [533, 688, 955, 769], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [81, 112, 499, 200], \"label\": \"Text\"}, {\"bbox_2d\": [81, 214, 501, 287], \"label\": \"Text\"}, {\"bbox_2d\": [81, 300, 502, 609], \"label\": \"Text\"}, {\"bbox_2d\": [81, 622, 502, 883], \"label\": \"Text\"}, {\"bbox_2d\": [533, 789, 944, 894], \"label\": \"Text\"}, {\"bbox_2d\": [533, 112, 953, 169], \"label\": \"Text\"}, {\"bbox_2d\": [533, 182, 946, 365], \"label\": \"Text\"}, {\"bbox_2d\": [533, 378, 954, 514], \"label\": \"Text\"}, {\"bbox_2d\": [533, 528, 951, 679], \"label\": \"Text\"}, {\"bbox_2d\": [532, 691, 954, 768], \"label\": \"Table\"}, {\"bbox_2d\": [822, 1005, 955, 1013], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [32, 208, 60, 770], \"label\": \"Page-header\"}, {\"bbox_2d\": [111, 102, 819, 131], \"label\": \"Text\"}, {\"bbox_2d\": [269, 136, 651, 158], \"label\": \"Text\"}, {\"bbox_2d\": [166, 164, 819, 176], \"label\": \"Text\"}, {\"bbox_2d\": [166, 182, 819, 204], \"label\": \"Text\"}, {\"bbox_2d\": [178, 209, 556, 231], \"label\": \"Text\"}, {\"bbox_2d\": [182, 244, 831, 308], \"label\": \"Text\"}, {\"bbox_2d\": [182, 322, 473, 334], \"label\": \"Text\"}, {\"bbox_2d\": [91, 362, 506, 586], \"label\": \"Text\"}, {\"bbox_2d\": [536, 362, 842, 493], \"label\": \"Text\"}, {\"bbox_2d\": [536, 498, 843, 690], \"label\": \"Text\"}, {\"bbox_2d\": [91, 586, 506, 844], \"label\": \"Text\"}, {\"bbox_2d\": [536, 695, 843, 949], \"label\": \"Text\"}, {\"bbox_2d\": [91, 844, 506, 949], \"label\": \"Text\"}, {\"bbox_2d\": [152, 69, 780, 84], \"label\": \"Title\"}]```", "label": "```json[{\"bbox_2d\": [151, 70, 891, 84], \"label\": \"Title\"}, {\"bbox_2d\": [31, 210, 61, 771], \"label\": \"Page-header\"}, {\"bbox_2d\": [111, 100, 930, 131], \"label\": \"Text\"}, {\"bbox_2d\": [270, 138, 771, 149], \"label\": \"Text\"}, {\"bbox_2d\": [223, 152, 819, 162], \"label\": \"Text\"}, {\"bbox_2d\": [165, 165, 877, 176], \"label\": \"Text\"}, {\"bbox_2d\": [288, 179, 753, 204], \"label\": \"Text\"}, {\"bbox_2d\": [180, 206, 862, 231], \"label\": \"Text\"}, {\"bbox_2d\": [184, 242, 858, 308], \"label\": \"Text\"}, {\"bbox_2d\": [184, 324, 471, 334], \"label\": \"Text\"}, {\"bbox_2d\": [91, 362, 506, 585], \"label\": \"Text\"}, {\"bbox_2d\": [91, 589, 506, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 696, 951, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 499, 951, 692], \"label\": \"Text\"}, {\"bbox_2d\": [536, 362, 951, 495], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [132, 95, 911, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 153, 911, 196], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 209, 911, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 266, 911, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 323, 911, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 379, 911, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 436, 911, 479], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 493, 911, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 550, 911, 593], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 607, 911, 649], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 663, 911, 706], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 719, 911, 762], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 776, 911, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 834, 911, 876], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 890, 911, 932], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 946, 811, 962], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [131, 97, 911, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 154, 911, 195], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 210, 911, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 267, 911, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 324, 911, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 380, 911, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 437, 911, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 494, 911, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 551, 911, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 607, 911, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 664, 911, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 721, 911, 762], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 777, 911, 818], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 834, 911, 875], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 891, 911, 932], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 947, 811, 960], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [294, 411, 934, 456], \"label\": \"Text\"}, {\"bbox_2d\": [294, 474, 893, 503], \"label\": \"Text\"}, {\"bbox_2d\": [294, 544, 863, 559], \"label\": \"Text\"}, {\"bbox_2d\": [218, 521, 447, 538], \"label\": \"Section-header\"}, {\"bbox_2d\": [644, 979, 937, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 87, 930, 384], \"label\": \"Picture\"}, {\"bbox_2d\": [295, 561, 930, 858], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [293, 412, 928, 457], \"label\": \"Text\"}, {\"bbox_2d\": [293, 475, 890, 504], \"label\": \"Text\"}, {\"bbox_2d\": [293, 546, 858, 559], \"label\": \"Text\"}, {\"bbox_2d\": [644, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [317, 570, 844, 840], \"label\": \"Text\"}, {\"bbox_2d\": [317, 94, 844, 364], \"label\": \"Text\"}, {\"bbox_2d\": [217, 522, 446, 539], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 103, 506, 298], \"label\": \"Text\"}, {\"bbox_2d\": [536, 212, 951, 263], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 307, 951, 335], \"label\": \"Text\"}, {\"bbox_2d\": [91, 358, 506, 448], \"label\": \"Text\"}, {\"bbox_2d\": [536, 396, 951, 677], \"label\": \"Text\"}, {\"bbox_2d\": [536, 679, 951, 720], \"label\": \"Text\"}, {\"bbox_2d\": [536, 829, 951, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 755, 951, 815], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 747, 506, 792], \"label\": \"Text\"}, {\"bbox_2d\": [536, 896, 951, 948], \"label\": \"Text\"}, {\"bbox_2d\": [566, 367, 810, 378], \"label\": \"Section-header\"}, {\"bbox_2d\": [237, 482, 364, 493], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 508, 506, 587], \"label\": \"Text\"}, {\"bbox_2d\": [91, 644, 506, 687], \"label\": \"Text\"}, {\"bbox_2d\": [536, 691, 566, 719], \"label\": \"Text\"}, {\"bbox_2d\": [91, 874, 506, 938], \"label\": \"Text\"}, {\"bbox_2d\": [176, 798, 506, 858], \"label\": \"Formula\"}, {\"bbox_2d\": [176, 869, 506, 948], \"label\": \"Formula\"}, {\"bbox_2d\": [176, 921, 506, 948], \"label\": \"Formula\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [237, 72, 368, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 66, 919, 194], \"label\": \"Picture\"}, {\"bbox_2d\": [103, 593, 506, 638], \"label\": \"Formula\"}, {\"bbox_2d\": [241, 328, 359, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [239, 702, 506, 740], \"label\": \"Formula\"}, {\"bbox_2d\": [143, 714, 506, 742], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [943, 38, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [229, 73, 368, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [239, 329, 358, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [677, 367, 811, 378], \"label\": \"Section-header\"}, {\"bbox_2d\": [234, 480, 363, 491], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 105, 515, 297], \"label\": \"Text\"}, {\"bbox_2d\": [536, 306, 951, 335], \"label\": \"Text\"}, {\"bbox_2d\": [91, 362, 512, 448], \"label\": \"Text\"}, {\"bbox_2d\": [536, 400, 961, 651], \"label\": \"Text\"}, {\"bbox_2d\": [536, 655, 951, 744], \"label\": \"Text\"}, {\"bbox_2d\": [91, 513, 509, 585], \"label\": \"Text\"}, {\"bbox_2d\": [91, 644, 506, 687], \"label\": \"Text\"}, {\"bbox_2d\": [91, 747, 506, 792], \"label\": \"Text\"}, {\"bbox_2d\": [91, 877, 515, 916], \"label\": \"Text\"}, {\"bbox_2d\": [183, 800, 506, 840], \"label\": \"Formula\"}, {\"bbox_2d\": [181, 836, 506, 875], \"label\": \"Formula\"}, {\"bbox_2d\": [176, 919, 506, 958], \"label\": \"Formula\"}, {\"bbox_2d\": [576, 755, 951, 825], \"label\": \"Formula\"}, {\"bbox_2d\": [126, 699, 506, 738], \"label\": \"Formula\"}, {\"bbox_2d\": [100, 596, 506, 642], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 211, 960, 263], \"label\": \"Caption\"}, {\"bbox_2d\": [575, 65, 913, 193], \"label\": \"Picture\"}, {\"bbox_2d\": [536, 829, 952, 946], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [452, 46, 653, 199], \"label\": \"Text\"}, {\"bbox_2d\": [731, 46, 934, 246], \"label\": \"Text\"}, {\"bbox_2d\": [278, 327, 934, 379], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [731, 48, 932, 247], \"label\": \"Text\"}, {\"bbox_2d\": [452, 48, 657, 199], \"label\": \"Text\"}, {\"bbox_2d\": [58, 282, 1035, 990], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [934, 38, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [250, 73, 348, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 105, 506, 284], \"label\": \"Text\"}, {\"bbox_2d\": [91, 326, 506, 366], \"label\": \"Text\"}, {\"bbox_2d\": [91, 369, 506, 397], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 361], \"label\": \"Text\"}, {\"bbox_2d\": [536, 395, 951, 724], \"label\": \"Text\"}, {\"bbox_2d\": [91, 498, 506, 603], \"label\": \"Text\"}, {\"bbox_2d\": [91, 666, 506, 694], \"label\": \"Text\"}, {\"bbox_2d\": [91, 697, 506, 724], \"label\": \"Text\"}, {\"bbox_2d\": [536, 783, 951, 825], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 824, 951, 862], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 863, 951, 891], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 892, 951, 943], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 783, 506, 825], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 834, 506, 862], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 871, 506, 922], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 924, 506, 943], \"label\": \"List-item\"}, {\"bbox_2d\": [238, 412, 506, 431], \"label\": \"Formula\"}, {\"bbox_2d\": [205, 467, 506, 485], \"label\": \"Formula\"}, {\"bbox_2d\": [136, 612, 506, 650], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 442, 134, 453], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 297, 506, 325], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [249, 73, 347, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 72, 951, 359], \"label\": \"Text\"}, {\"bbox_2d\": [91, 105, 506, 282], \"label\": \"Text\"}, {\"bbox_2d\": [91, 325, 506, 368], \"label\": \"Text\"}, {\"bbox_2d\": [91, 370, 506, 398], \"label\": \"Text\"}, {\"bbox_2d\": [91, 442, 134, 454], \"label\": \"Text\"}, {\"bbox_2d\": [91, 498, 506, 600], \"label\": \"Text\"}, {\"bbox_2d\": [536, 394, 951, 722], \"label\": \"Text\"}, {\"bbox_2d\": [91, 666, 506, 692], \"label\": \"Text\"}, {\"bbox_2d\": [91, 696, 506, 722], \"label\": \"Text\"}, {\"bbox_2d\": [125, 297, 506, 310], \"label\": \"Formula\"}, {\"bbox_2d\": [241, 411, 506, 426], \"label\": \"Formula\"}, {\"bbox_2d\": [206, 469, 506, 482], \"label\": \"Formula\"}, {\"bbox_2d\": [934, 38, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [544, 892, 951, 944], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 865, 951, 889], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 823, 951, 862], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 782, 951, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 919, 506, 943], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 878, 506, 916], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 851, 506, 875], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 837, 506, 848], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 782, 506, 834], \"label\": \"List-item\"}, {\"bbox_2d\": [136, 613, 506, 649], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [150, 158, 309, 170], \"label\": \"Section-header\"}, {\"bbox_2d\": [346, 95, 714, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [650, 45, 911, 55], \"label\": \"Page-header\"}, {\"bbox_2d\": [274, 68, 786, 77], \"label\": \"Text\"}, {\"bbox_2d\": [147, 172, 453, 213], \"label\": \"Text\"}, {\"bbox_2d\": [147, 284, 911, 395], \"label\": \"Text\"}, {\"bbox_2d\": [147, 411, 911, 479], \"label\": \"Text\"}, {\"bbox_2d\": [147, 495, 911, 580], \"label\": \"Text\"}, {\"bbox_2d\": [147, 595, 911, 621], \"label\": \"Text\"}, {\"bbox_2d\": [147, 636, 911, 675], \"label\": \"Text\"}, {\"bbox_2d\": [147, 691, 392, 717], \"label\": \"Text\"}, {\"bbox_2d\": [522, 691, 768, 717], \"label\": \"Text\"}, {\"bbox_2d\": [522, 765, 708, 776], \"label\": \"Text\"}, {\"bbox_2d\": [147, 765, 335, 791], \"label\": \"Text\"}, {\"bbox_2d\": [522, 780, 625, 791], \"label\": \"Text\"}, {\"bbox_2d\": [147, 949, 911, 974], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 980, 868, 990], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 228, 389, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 241, 911, 271], \"label\": \"Section-header\"}, {\"bbox_2d\": [841, 1009, 911, 1018], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [274, 67, 790, 78], \"label\": \"Text\"}, {\"bbox_2d\": [346, 93, 720, 110], \"label\": \"Title\"}, {\"bbox_2d\": [147, 156, 458, 212], \"label\": \"Text\"}, {\"bbox_2d\": [147, 283, 918, 396], \"label\": \"Text\"}, {\"bbox_2d\": [147, 408, 918, 479], \"label\": \"Text\"}, {\"bbox_2d\": [147, 491, 918, 580], \"label\": \"Text\"}, {\"bbox_2d\": [147, 592, 918, 620], \"label\": \"Text\"}, {\"bbox_2d\": [147, 633, 918, 675], \"label\": \"Text\"}, {\"bbox_2d\": [147, 690, 400, 704], \"label\": \"Text\"}, {\"bbox_2d\": [147, 704, 346, 718], \"label\": \"Text\"}, {\"bbox_2d\": [523, 690, 776, 704], \"label\": \"Text\"}, {\"bbox_2d\": [523, 704, 722, 718], \"label\": \"Text\"}, {\"bbox_2d\": [523, 765, 716, 778], \"label\": \"Text\"}, {\"bbox_2d\": [523, 779, 632, 793], \"label\": \"Text\"}, {\"bbox_2d\": [147, 761, 343, 778], \"label\": \"Text\"}, {\"bbox_2d\": [147, 779, 249, 793], \"label\": \"Text\"}, {\"bbox_2d\": [147, 946, 917, 972], \"label\": \"Footnote\"}, {\"bbox_2d\": [147, 977, 873, 991], \"label\": \"Footnote\"}, {\"bbox_2d\": [841, 1010, 916, 1021], \"label\": \"Page-footer\"}, {\"bbox_2d\": [650, 44, 917, 55], \"label\": \"Page-header\"}, {\"bbox_2d\": [147, 226, 913, 268], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [227, 103, 678, 515], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 520, 545, 533], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 551, 908, 583], \"label\": \"Text\"}, {\"bbox_2d\": [231, 655, 926, 767], \"label\": \"Text\"}, {\"bbox_2d\": [231, 778, 923, 826], \"label\": \"Text\"}, {\"bbox_2d\": [231, 838, 930, 886], \"label\": \"Text\"}, {\"bbox_2d\": [231, 898, 875, 915], \"label\": \"Text\"}, {\"bbox_2d\": [109, 984, 457, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 594, 473, 634], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [231, 519, 539, 531], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 598, 469, 609], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 554, 908, 581], \"label\": \"Text\"}, {\"bbox_2d\": [231, 658, 926, 764], \"label\": \"Text\"}, {\"bbox_2d\": [231, 781, 921, 825], \"label\": \"Text\"}, {\"bbox_2d\": [231, 841, 926, 885], \"label\": \"Text\"}, {\"bbox_2d\": [231, 902, 869, 914], \"label\": \"Text\"}, {\"bbox_2d\": [231, 620, 400, 632], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 453, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 102, 676, 516], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [787, 77, 988, 108], \"label\": \"Page-header\"}, {\"bbox_2d\": [583, 121, 990, 212], \"label\": \"Text\"}, {\"bbox_2d\": [583, 260, 637, 275], \"label\": \"Text\"}, {\"bbox_2d\": [135, 121, 544, 155], \"label\": \"Text\"}, {\"bbox_2d\": [135, 377, 544, 559], \"label\": \"Text\"}, {\"bbox_2d\": [135, 562, 544, 856], \"label\": \"Text\"}, {\"bbox_2d\": [135, 859, 545, 983], \"label\": \"Text\"}, {\"bbox_2d\": [583, 819, 991, 983], \"label\": \"Text\"}, {\"bbox_2d\": [548, 994, 573, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [583, 469, 991, 816], \"label\": \"Text\"}, {\"bbox_2d\": [583, 394, 990, 467], \"label\": \"Text\"}, {\"bbox_2d\": [593, 281, 858, 296], \"label\": \"Text\"}, {\"bbox_2d\": [594, 303, 884, 318], \"label\": \"Text\"}, {\"bbox_2d\": [626, 325, 990, 356], \"label\": \"Text\"}, {\"bbox_2d\": [622, 363, 848, 378], \"label\": \"Text\"}, {\"bbox_2d\": [135, 254, 188, 270], \"label\": \"Text\"}, {\"bbox_2d\": [199, 169, 466, 203], \"label\": \"Formula\"}, {\"bbox_2d\": [199, 217, 312, 251], \"label\": \"Formula\"}, {\"bbox_2d\": [173, 274, 431, 362], \"label\": \"Formula\"}, {\"bbox_2d\": [645, 220, 814, 254], \"label\": \"Formula\"}, {\"bbox_2d\": [736, 322, 988, 354], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [786, 78, 1006, 108], \"label\": \"Page-header\"}, {\"bbox_2d\": [583, 119, 998, 210], \"label\": \"Text\"}, {\"bbox_2d\": [136, 119, 552, 154], \"label\": \"Text\"}, {\"bbox_2d\": [215, 167, 473, 203], \"label\": \"Text\"}, {\"bbox_2d\": [136, 251, 197, 268], \"label\": \"Text\"}, {\"bbox_2d\": [200, 270, 413, 287], \"label\": \"List-item\"}, {\"bbox_2d\": [204, 289, 377, 306], \"label\": \"List-item\"}, {\"bbox_2d\": [202, 307, 427, 324], \"label\": \"List-item\"}, {\"bbox_2d\": [173, 325, 548, 361], \"label\": \"List-item\"}, {\"bbox_2d\": [627, 279, 861, 296], \"label\": \"List-item\"}, {\"bbox_2d\": [624, 300, 896, 317], \"label\": \"List-item\"}, {\"bbox_2d\": [629, 321, 996, 357], \"label\": \"List-item\"}, {\"bbox_2d\": [622, 360, 853, 378], \"label\": \"List-item\"}, {\"bbox_2d\": [583, 257, 644, 274], \"label\": \"Text\"}, {\"bbox_2d\": [582, 392, 1001, 464], \"label\": \"Text\"}, {\"bbox_2d\": [134, 375, 556, 558], \"label\": \"Text\"}, {\"bbox_2d\": [581, 465, 1004, 815], \"label\": \"Text\"}, {\"bbox_2d\": [132, 560, 549, 853], \"label\": \"Text\"}, {\"bbox_2d\": [581, 817, 1002, 982], \"label\": \"Text\"}, {\"bbox_2d\": [131, 854, 551, 982], \"label\": \"Text\"}, {\"bbox_2d\": [547, 990, 585, 1007], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [156, 91, 885, 119], \"label\": \"Text\"}, {\"bbox_2d\": [156, 163, 884, 191], \"label\": \"Text\"}, {\"bbox_2d\": [187, 198, 541, 212], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 219, 540, 233], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 239, 368, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 259, 730, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 279, 690, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 297, 641, 310], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 317, 655, 330], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 356, 726, 370], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 377, 884, 405], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 412, 788, 426], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 433, 884, 459], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 465, 805, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 484, 864, 498], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 505, 612, 519], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 525, 884, 553], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 560, 884, 587], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 594, 663, 607], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 614, 559, 628], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 635, 884, 662], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 669, 884, 697], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 703, 496, 717], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 723, 884, 750], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 757, 412, 770], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 777, 482, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 797, 884, 825], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 832, 884, 875], \"label\": \"List-item\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [156, 139, 455, 153], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [156, 91, 884, 117], \"label\": \"Text\"}, {\"bbox_2d\": [156, 165, 884, 192], \"label\": \"Text\"}, {\"bbox_2d\": [171, 338, 560, 350], \"label\": \"Text\"}, {\"bbox_2d\": [186, 200, 541, 211], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 219, 540, 231], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 239, 368, 251], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 259, 731, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 279, 691, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 299, 639, 310], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 318, 656, 330], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 358, 728, 370], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 378, 884, 404], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 412, 790, 424], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 432, 884, 459], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 467, 806, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 487, 864, 498], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 506, 611, 518], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 526, 884, 553], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 561, 884, 587], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 595, 662, 607], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 615, 559, 627], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 635, 884, 661], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 669, 884, 696], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 724, 884, 750], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 758, 411, 770], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 778, 481, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 798, 884, 825], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 833, 884, 874], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 704, 497, 716], \"label\": \"Text\"}, {\"bbox_2d\": [156, 140, 455, 152], \"label\": \"Section-header\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [95, 50, 936, 947], \"label\": \"Table\"}, {\"bbox_2d\": [99, 988, 548, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [99, 990, 550, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [139, 56, 463, 68], \"label\": \"Section-header\"}, {\"bbox_2d\": [515, 58, 686, 67], \"label\": \"Text\"}, {\"bbox_2d\": [647, 77, 796, 87], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 926, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 132, 711, 144], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 148, 918, 159], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 171, 926, 572], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 575, 739, 585], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 692, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 926, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 131, 711, 143], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 147, 917, 159], \"label\": \"List-item\"}, {\"bbox_2d\": [233, 171, 927, 571], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 575, 733, 586], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [273, 83, 592, 97], \"label\": \"Text\"}, {\"bbox_2d\": [273, 103, 907, 134], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 139, 579, 153], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 158, 502, 172], \"label\": \"List-item\"}, {\"bbox_2d\": [289, 177, 492, 191], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 197, 525, 211], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 217, 572, 231], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 236, 630, 249], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 255, 480, 269], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 275, 505, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 295, 493, 309], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 315, 510, 329], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 335, 555, 349], \"label\": \"List-item\"}, {\"bbox_2d\": [312, 355, 554, 369], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 374, 502, 388], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 394, 485, 408], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 414, 492, 428], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 434, 505, 447], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 453, 481, 467], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 472, 534, 486], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 492, 525, 506], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 512, 621, 525], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 531, 514, 545], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 551, 521, 565], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 571, 534, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 590, 621, 604], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 610, 517, 624], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 629, 509, 643], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 659, 884, 689], \"label\": \"Text\"}, {\"bbox_2d\": [273, 629, 509, 643], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 571, 534, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 590, 536, 604], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 610, 518, 624], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 630, 518, 644], \"label\": \"List-item\"}, {\"bbox_2d\": [290, 650, 520, 664], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 659, 884, 689], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 592, 98], \"label\": \"Text\"}, {\"bbox_2d\": [273, 102, 905, 133], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 137, 580, 153], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 157, 503, 173], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 179, 492, 192], \"label\": \"List-item\"}, {\"bbox_2d\": [313, 198, 525, 212], \"label\": \"List-item\"}, {\"bbox_2d\": [313, 218, 571, 231], \"label\": \"List-item\"}, {\"bbox_2d\": [313, 238, 630, 251], \"label\": \"List-item\"}, {\"bbox_2d\": [313, 257, 479, 270], \"label\": \"List-item\"}, {\"bbox_2d\": [313, 277, 504, 290], \"label\": \"List-item\"}, {\"bbox_2d\": [313, 296, 492, 310], \"label\": \"List-item\"}, {\"bbox_2d\": [313, 316, 510, 329], \"label\": \"List-item\"}, {\"bbox_2d\": [313, 335, 555, 349], \"label\": \"List-item\"}, {\"bbox_2d\": [313, 355, 555, 368], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 375, 501, 388], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 394, 484, 407], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 414, 491, 427], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 433, 504, 447], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 453, 481, 466], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 470, 532, 486], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 492, 521, 505], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 512, 620, 525], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 531, 514, 545], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 551, 521, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 570, 533, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 590, 621, 603], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 610, 517, 623], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 627, 511, 642], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 660, 884, 689], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 327, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [698, 93, 722, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 126, 348, 134], \"label\": \"Text\"}, {\"bbox_2d\": [118, 152, 498, 536], \"label\": \"Picture\"}, {\"bbox_2d\": [126, 560, 500, 599], \"label\": \"Text\"}, {\"bbox_2d\": [157, 654, 263, 713], \"label\": \"Picture\"}, {\"bbox_2d\": [525, 274, 894, 447], \"label\": \"Text\"}, {\"bbox_2d\": [525, 455, 894, 507], \"label\": \"Text\"}, {\"bbox_2d\": [525, 638, 894, 707], \"label\": \"Text\"}, {\"bbox_2d\": [525, 715, 894, 766], \"label\": \"Text\"}, {\"bbox_2d\": [525, 771, 895, 831], \"label\": \"Text\"}, {\"bbox_2d\": [525, 837, 894, 858], \"label\": \"Text\"}, {\"bbox_2d\": [525, 864, 895, 943], \"label\": \"Text\"}, {\"bbox_2d\": [131, 869, 500, 971], \"label\": \"Picture\"}, {\"bbox_2d\": [131, 737, 500, 820], \"label\": \"Text\"}, {\"bbox_2d\": [131, 827, 500, 865], \"label\": \"Text\"}, {\"bbox_2d\": [526, 946, 895, 983], \"label\": \"Text\"}, {\"bbox_2d\": [553, 541, 895, 608], \"label\": \"Picture\"}, {\"bbox_2d\": [674, 122, 895, 251], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 138, 659, 211], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [698, 92, 721, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [440, 74, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 92, 326, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [281, 125, 350, 135], \"label\": \"Text\"}, {\"bbox_2d\": [524, 274, 898, 449], \"label\": \"Text\"}, {\"bbox_2d\": [526, 455, 895, 506], \"label\": \"Text\"}, {\"bbox_2d\": [526, 639, 895, 704], \"label\": \"Text\"}, {\"bbox_2d\": [524, 710, 894, 760], \"label\": \"Text\"}, {\"bbox_2d\": [526, 768, 894, 830], \"label\": \"Text\"}, {\"bbox_2d\": [526, 836, 894, 859], \"label\": \"Text\"}, {\"bbox_2d\": [525, 865, 894, 940], \"label\": \"Text\"}, {\"bbox_2d\": [526, 946, 895, 982], \"label\": \"Text\"}, {\"bbox_2d\": [130, 826, 500, 865], \"label\": \"Text\"}, {\"bbox_2d\": [130, 737, 500, 819], \"label\": \"Text\"}, {\"bbox_2d\": [131, 559, 500, 598], \"label\": \"Text\"}, {\"bbox_2d\": [554, 532, 895, 617], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 644, 501, 712], \"label\": \"Picture\"}, {\"bbox_2d\": [131, 895, 499, 971], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 121, 889, 251], \"label\": \"Picture\"}, {\"bbox_2d\": [124, 150, 498, 533], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [201, 211, 552, 262], \"label\": \"Title\"}, {\"bbox_2d\": [388, 876, 977, 983], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [492, 876, 973, 983], \"label\": \"Title\"}, {\"bbox_2d\": [40, 207, 557, 260], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [268, 34, 1001, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [0, 89, 1010, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [607, 894, 1000, 996], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [270, 31, 999, 84], \"label\": \"Title\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 84, 490, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 109, 883, 136], \"label\": \"Text\"}, {\"bbox_2d\": [333, 148, 516, 160], \"label\": \"Text\"}, {\"bbox_2d\": [333, 172, 892, 197], \"label\": \"Text\"}, {\"bbox_2d\": [395, 210, 583, 222], \"label\": \"Text\"}, {\"bbox_2d\": [395, 251, 911, 277], \"label\": \"Text\"}, {\"bbox_2d\": [395, 305, 901, 425], \"label\": \"Text\"}, {\"bbox_2d\": [395, 454, 796, 466], \"label\": \"Text\"}, {\"bbox_2d\": [395, 478, 632, 489], \"label\": \"Text\"}, {\"bbox_2d\": [395, 502, 851, 513], \"label\": \"Text\"}, {\"bbox_2d\": [395, 542, 886, 582], \"label\": \"Text\"}, {\"bbox_2d\": [395, 611, 878, 622], \"label\": \"Text\"}, {\"bbox_2d\": [274, 674, 898, 719], \"label\": \"Text\"}, {\"bbox_2d\": [274, 737, 907, 749], \"label\": \"Text\"}, {\"bbox_2d\": [274, 644, 866, 663], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [333, 438, 478, 451], \"label\": \"Text\"}, {\"bbox_2d\": [395, 523, 467, 535], \"label\": \"Section-header\"}, {\"bbox_2d\": [395, 594, 484, 606], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 490, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 234, 393, 247], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 289, 403, 302], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 437, 479, 450], \"label\": \"Section-header\"}, {\"bbox_2d\": [395, 524, 468, 537], \"label\": \"Section-header\"}, {\"bbox_2d\": [395, 594, 483, 607], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 645, 866, 662], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 674, 897, 718], \"label\": \"Text\"}, {\"bbox_2d\": [273, 736, 907, 749], \"label\": \"Text\"}, {\"bbox_2d\": [456, 610, 879, 623], \"label\": \"Text\"}, {\"bbox_2d\": [456, 539, 887, 584], \"label\": \"Text\"}, {\"bbox_2d\": [395, 453, 795, 466], \"label\": \"Text\"}, {\"bbox_2d\": [395, 476, 632, 490], \"label\": \"Text\"}, {\"bbox_2d\": [395, 304, 899, 427], \"label\": \"Text\"}, {\"bbox_2d\": [395, 249, 911, 278], \"label\": \"Text\"}, {\"bbox_2d\": [395, 210, 583, 223], \"label\": \"Text\"}, {\"bbox_2d\": [334, 108, 884, 137], \"label\": \"Text\"}, {\"bbox_2d\": [334, 148, 516, 161], \"label\": \"Text\"}, {\"bbox_2d\": [334, 171, 891, 200], \"label\": \"List-item\"}, {\"bbox_2d\": [395, 500, 851, 513], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [28, 45, 902, 301], \"label\": \"Table\"}, {\"bbox_2d\": [508, 311, 525, 317], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [366, 42, 663, 65], \"label\": \"Section-header\"}, {\"bbox_2d\": [19, 77, 1011, 290], \"label\": \"Table\"}, {\"bbox_2d\": [511, 311, 522, 318], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [622, 648, 977, 676], \"label\": \"Caption\"}, {\"bbox_2d\": [98, 724, 949, 795], \"label\": \"Text\"}, {\"bbox_2d\": [98, 809, 949, 880], \"label\": \"Text\"}, {\"bbox_2d\": [98, 894, 949, 935], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [400, 243, 631, 672], \"label\": \"Picture\"}, {\"bbox_2d\": [624, 647, 977, 675], \"label\": \"Caption\"}, {\"bbox_2d\": [97, 723, 952, 794], \"label\": \"Text\"}, {\"bbox_2d\": [97, 808, 946, 879], \"label\": \"Text\"}, {\"bbox_2d\": [97, 893, 944, 935], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [783, 74, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 94, 523, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 74, 313, 86], \"label\": \"Page-header\"}, {\"bbox_2d\": [564, 144, 633, 154], \"label\": \"Text\"}, {\"bbox_2d\": [159, 145, 445, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [527, 153, 866, 968], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [786, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [542, 142, 871, 971], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 142, 468, 967], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [675, 147, 776, 414], \"label\": \"Picture\"}, {\"bbox_2d\": [42, 143, 626, 388], \"label\": \"Picture\"}, {\"bbox_2d\": [45, 722, 929, 899], \"label\": \"Picture\"}, {\"bbox_2d\": [938, 927, 953, 945], \"label\": \"Page-footer\"}, {\"bbox_2d\": [45, 440, 320, 477], \"label\": \"Text\"}, {\"bbox_2d\": [45, 486, 320, 677], \"label\": \"Text\"}, {\"bbox_2d\": [346, 440, 621, 477], \"label\": \"Text\"}, {\"bbox_2d\": [346, 530, 621, 677], \"label\": \"Text\"}, {\"bbox_2d\": [648, 440, 921, 524], \"label\": \"Text\"}, {\"bbox_2d\": [648, 530, 921, 677], \"label\": \"Text\"}, {\"bbox_2d\": [346, 505, 456, 521], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [677, 146, 889, 407], \"label\": \"Picture\"}, {\"bbox_2d\": [41, 143, 626, 389], \"label\": \"Picture\"}, {\"bbox_2d\": [44, 439, 320, 477], \"label\": \"Text\"}, {\"bbox_2d\": [44, 483, 321, 678], \"label\": \"Text\"}, {\"bbox_2d\": [345, 439, 622, 477], \"label\": \"Text\"}, {\"bbox_2d\": [345, 503, 457, 522], \"label\": \"Section-header\"}, {\"bbox_2d\": [345, 528, 617, 678], \"label\": \"Text\"}, {\"bbox_2d\": [647, 439, 920, 521], \"label\": \"Text\"}, {\"bbox_2d\": [647, 528, 918, 678], \"label\": \"Text\"}, {\"bbox_2d\": [42, 723, 203, 868], \"label\": \"Picture\"}, {\"bbox_2d\": [767, 724, 928, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [586, 725, 748, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [405, 724, 566, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [223, 724, 384, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [769, 879, 907, 902], \"label\": \"Caption\"}, {\"bbox_2d\": [588, 879, 738, 902], \"label\": \"Caption\"}, {\"bbox_2d\": [406, 879, 526, 902], \"label\": \"Caption\"}, {\"bbox_2d\": [225, 879, 384, 902], \"label\": \"Caption\"}, {\"bbox_2d\": [44, 879, 194, 902], \"label\": \"Caption\"}, {\"bbox_2d\": [939, 928, 950, 944], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [67, 48, 354, 94], \"label\": \"Section-header\"}, {\"bbox_2d\": [67, 109, 482, 311], \"label\": \"Text\"}, {\"bbox_2d\": [523, 109, 937, 407], \"label\": \"Text\"}, {\"bbox_2d\": [67, 372, 88, 386], \"label\": \"Text\"}, {\"bbox_2d\": [67, 445, 117, 459], \"label\": \"Text\"}, {\"bbox_2d\": [67, 615, 482, 875], \"label\": \"Text\"}, {\"bbox_2d\": [67, 879, 482, 969], \"label\": \"Text\"}, {\"bbox_2d\": [523, 412, 937, 711], \"label\": \"Text\"}, {\"bbox_2d\": [523, 716, 937, 969], \"label\": \"Text\"}, {\"bbox_2d\": [489, 981, 516, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [207, 326, 342, 366], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 396, 344, 437], \"label\": \"Formula\"}, {\"bbox_2d\": [124, 468, 458, 597], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [69, 45, 246, 60], \"label\": \"Page-header\"}, {\"bbox_2d\": [67, 62, 363, 93], \"label\": \"Section-header\"}, {\"bbox_2d\": [67, 105, 481, 141], \"label\": \"Text\"}, {\"bbox_2d\": [66, 142, 494, 309], \"label\": \"Text\"}, {\"bbox_2d\": [213, 334, 351, 365], \"label\": \"Formula\"}, {\"bbox_2d\": [68, 367, 92, 385], \"label\": \"Text\"}, {\"bbox_2d\": [66, 440, 129, 458], \"label\": \"Text\"}, {\"bbox_2d\": [107, 464, 463, 594], \"label\": \"Text\"}, {\"bbox_2d\": [67, 608, 496, 870], \"label\": \"Text\"}, {\"bbox_2d\": [71, 871, 495, 964], \"label\": \"Text\"}, {\"bbox_2d\": [489, 971, 529, 989], \"label\": \"Page-footer\"}, {\"bbox_2d\": [521, 105, 951, 405], \"label\": \"Text\"}, {\"bbox_2d\": [521, 406, 945, 708], \"label\": \"Text\"}, {\"bbox_2d\": [523, 709, 947, 963], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [430, 29, 613, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [98, 94, 751, 133], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 176, 749, 195], \"label\": \"Text\"}, {\"bbox_2d\": [98, 219, 638, 284], \"label\": \"Text\"}, {\"bbox_2d\": [98, 310, 914, 372], \"label\": \"Text\"}, {\"bbox_2d\": [98, 398, 937, 504], \"label\": \"Text\"}, {\"bbox_2d\": [98, 531, 842, 549], \"label\": \"Text\"}, {\"bbox_2d\": [371, 605, 664, 627], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 654, 728, 737], \"label\": \"Text\"}, {\"bbox_2d\": [309, 763, 721, 804], \"label\": \"Text\"}, {\"bbox_2d\": [256, 829, 778, 870], \"label\": \"Text\"}, {\"bbox_2d\": [243, 914, 680, 940], \"label\": \"Picture\"}, {\"bbox_2d\": [225, 957, 812, 978], \"label\": \"Text\"}, {\"bbox_2d\": [438, 999, 601, 1007], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [370, 597, 665, 631], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 86, 754, 136], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 174, 751, 195], \"label\": \"Text\"}, {\"bbox_2d\": [98, 218, 638, 284], \"label\": \"Text\"}, {\"bbox_2d\": [98, 306, 923, 372], \"label\": \"Text\"}, {\"bbox_2d\": [98, 395, 945, 505], \"label\": \"Text\"}, {\"bbox_2d\": [98, 528, 843, 550], \"label\": \"Text\"}, {\"bbox_2d\": [307, 651, 728, 739], \"label\": \"Text\"}, {\"bbox_2d\": [312, 761, 723, 805], \"label\": \"Text\"}, {\"bbox_2d\": [256, 828, 779, 872], \"label\": \"Text\"}, {\"bbox_2d\": [223, 958, 815, 979], \"label\": \"Text\"}, {\"bbox_2d\": [437, 997, 602, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [348, 910, 687, 944], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [275, 84, 705, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [272, 98, 913, 849], \"label\": \"Table\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [272, 86, 707, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [271, 97, 914, 850], \"label\": \"Table\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 91, 911, 155], \"label\": \"Text\"}, {\"bbox_2d\": [125, 164, 911, 227], \"label\": \"Text\"}, {\"bbox_2d\": [125, 237, 911, 276], \"label\": \"Text\"}, {\"bbox_2d\": [125, 286, 452, 300], \"label\": \"Text\"}, {\"bbox_2d\": [125, 310, 911, 324], \"label\": \"Text\"}, {\"bbox_2d\": [125, 536, 911, 697], \"label\": \"Text\"}, {\"bbox_2d\": [125, 811, 911, 875], \"label\": \"Text\"}, {\"bbox_2d\": [125, 736, 911, 789], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 346, 323, 359], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 382, 441, 396], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 409, 650, 423], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 443, 378, 457], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 477, 911, 515], \"label\": \"List-item\"}, {\"bbox_2d\": [448, 905, 911, 944], \"label\": \"Formula\"}, {\"bbox_2d\": [513, 975, 522, 987], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [125, 91, 911, 161], \"label\": \"Text\"}, {\"bbox_2d\": [125, 165, 911, 226], \"label\": \"Text\"}, {\"bbox_2d\": [125, 238, 911, 275], \"label\": \"Text\"}, {\"bbox_2d\": [151, 287, 567, 308], \"label\": \"Text\"}, {\"bbox_2d\": [151, 312, 911, 333], \"label\": \"Text\"}, {\"bbox_2d\": [125, 538, 911, 697], \"label\": \"Text\"}, {\"bbox_2d\": [125, 813, 911, 875], \"label\": \"Text\"}, {\"bbox_2d\": [125, 737, 911, 789], \"label\": \"Section-header\"}, {\"bbox_2d\": [448, 904, 911, 954], \"label\": \"Formula\"}, {\"bbox_2d\": [140, 346, 323, 359], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 380, 442, 402], \"label\": \"List-item\"}, {\"bbox_2d\": [175, 410, 651, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 444, 461, 465], \"label\": \"List-item\"}, {\"bbox_2d\": [126, 478, 911, 515], \"label\": \"List-item\"}, {\"bbox_2d\": [513, 974, 522, 987], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 512, 105], \"label\": \"Text\"}, {\"bbox_2d\": [231, 115, 729, 127], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 139, 508, 150], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 162, 600, 173], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 183, 829, 210], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 221, 895, 247], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 265, 926, 292], \"label\": \"Text\"}, {\"bbox_2d\": [231, 624, 926, 651], \"label\": \"Text\"}, {\"bbox_2d\": [231, 671, 455, 688], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 693, 919, 721], \"label\": \"Text\"}, {\"bbox_2d\": [231, 589, 645, 601], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 315, 928, 585], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 741, 612, 895], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 898, 659, 909], \"label\": \"Caption\"}, {\"bbox_2d\": [738, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 512, 105], \"label\": \"Text\"}, {\"bbox_2d\": [231, 264, 926, 292], \"label\": \"Text\"}, {\"bbox_2d\": [231, 220, 900, 248], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 182, 834, 210], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 160, 600, 172], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 138, 507, 150], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 115, 730, 127], \"label\": \"List-item\"}, {\"bbox_2d\": [230, 309, 927, 584], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 589, 644, 600], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 623, 926, 651], \"label\": \"Text\"}, {\"bbox_2d\": [231, 693, 921, 720], \"label\": \"Text\"}, {\"bbox_2d\": [231, 673, 456, 687], \"label\": \"Section-header\"}, {\"bbox_2d\": [229, 740, 611, 895], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 899, 666, 910], \"label\": \"Caption\"}, {\"bbox_2d\": [738, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [57, 33, 344, 78], \"label\": \"Page-header\"}, {\"bbox_2d\": [65, 93, 864, 927], \"label\": \"Picture\"}, {\"bbox_2d\": [406, 949, 619, 966], \"label\": \"Caption\"}, {\"bbox_2d\": [62, 984, 240, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [497, 977, 525, 990], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [56, 32, 355, 79], \"label\": \"Section-header\"}, {\"bbox_2d\": [63, 93, 868, 927], \"label\": \"Picture\"}, {\"bbox_2d\": [408, 952, 625, 967], \"label\": \"Caption\"}, {\"bbox_2d\": [64, 985, 246, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [500, 976, 532, 991], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [81, 559, 396, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 582, 597, 680], \"label\": \"Text\"}, {\"bbox_2d\": [81, 756, 490, 774], \"label\": \"Section-header\"}, {\"bbox_2d\": [305, 781, 930, 858], \"label\": \"Text\"}, {\"bbox_2d\": [823, 1003, 955, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [59, 735, 257, 875], \"label\": \"Picture\"}, {\"bbox_2d\": [664, 549, 930, 694], \"label\": \"Picture\"}, {\"bbox_2d\": [530, 915, 695, 962], \"label\": \"Picture\"}, {\"bbox_2d\": [339, 914, 400, 972], \"label\": \"Picture\"}, {\"bbox_2d\": [856, 930, 956, 955], \"label\": \"Picture\"}, {\"bbox_2d\": [81, 927, 172, 955], \"label\": \"Picture\"}, {\"bbox_2d\": [297, 369, 499, 457], \"label\": \"Picture\"}, {\"bbox_2d\": [303, 450, 764, 524], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [81, 562, 396, 575], \"label\": \"Section-header\"}, {\"bbox_2d\": [304, 761, 491, 774], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 585, 597, 679], \"label\": \"Text\"}, {\"bbox_2d\": [304, 784, 932, 857], \"label\": \"Text\"}, {\"bbox_2d\": [822, 1005, 955, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [635, 547, 929, 693], \"label\": \"Picture\"}, {\"bbox_2d\": [48, 735, 258, 877], \"label\": \"Picture\"}, {\"bbox_2d\": [0, 0, 1035, 527], \"label\": \"Picture\"}, {\"bbox_2d\": [80, 914, 956, 970], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [525, 37, 545, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [118, 79, 957, 212], \"label\": \"Text\"}, {\"bbox_2d\": [118, 217, 957, 277], \"label\": \"Text\"}, {\"bbox_2d\": [118, 277, 957, 330], \"label\": \"Text\"}, {\"bbox_2d\": [118, 334, 957, 368], \"label\": \"Text\"}, {\"bbox_2d\": [118, 373, 957, 407], \"label\": \"Text\"}, {\"bbox_2d\": [118, 412, 957, 526], \"label\": \"Text\"}, {\"bbox_2d\": [118, 531, 957, 663], \"label\": \"Text\"}, {\"bbox_2d\": [118, 667, 957, 741], \"label\": \"Text\"}, {\"bbox_2d\": [118, 746, 957, 817], \"label\": \"Text\"}, {\"bbox_2d\": [118, 822, 957, 882], \"label\": \"Text\"}, {\"bbox_2d\": [175, 908, 957, 944], \"label\": \"Footnote\"}]```", "label": "```json[{\"bbox_2d\": [526, 41, 543, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [117, 77, 955, 212], \"label\": \"Text\"}, {\"bbox_2d\": [118, 215, 954, 270], \"label\": \"Text\"}, {\"bbox_2d\": [118, 274, 955, 329], \"label\": \"Text\"}, {\"bbox_2d\": [118, 332, 954, 368], \"label\": \"Text\"}, {\"bbox_2d\": [118, 372, 955, 407], \"label\": \"Text\"}, {\"bbox_2d\": [118, 411, 954, 524], \"label\": \"Text\"}, {\"bbox_2d\": [118, 528, 955, 662], \"label\": \"Text\"}, {\"bbox_2d\": [118, 666, 955, 740], \"label\": \"Text\"}, {\"bbox_2d\": [118, 744, 955, 819], \"label\": \"Text\"}, {\"bbox_2d\": [118, 823, 955, 879], \"label\": \"Text\"}, {\"bbox_2d\": [118, 903, 955, 943], \"label\": \"Footnote\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 248], \"label\": \"Text\"}, {\"bbox_2d\": [121, 259, 911, 494], \"label\": \"Text\"}, {\"bbox_2d\": [147, 505, 580, 521], \"label\": \"Text\"}, {\"bbox_2d\": [121, 706, 911, 864], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 894, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [892, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [231, 538, 804, 675], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [891, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 95, 911, 193], \"label\": \"Text\"}, {\"bbox_2d\": [121, 204, 910, 248], \"label\": \"Text\"}, {\"bbox_2d\": [121, 259, 911, 494], \"label\": \"Text\"}, {\"bbox_2d\": [147, 505, 579, 521], \"label\": \"Text\"}, {\"bbox_2d\": [121, 894, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [121, 707, 911, 863], \"label\": \"Caption\"}, {\"bbox_2d\": [227, 538, 804, 676], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [69, 51, 506, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [70, 73, 548, 84], \"label\": \"Text\"}, {\"bbox_2d\": [61, 105, 912, 130], \"label\": \"Text\"}, {\"bbox_2d\": [61, 384, 902, 421], \"label\": \"Text\"}, {\"bbox_2d\": [61, 436, 911, 502], \"label\": \"Text\"}, {\"bbox_2d\": [61, 517, 888, 583], \"label\": \"Text\"}, {\"bbox_2d\": [61, 361, 384, 369], \"label\": \"Text\"}, {\"bbox_2d\": [61, 109, 915, 356], \"label\": \"Table\"}, {\"bbox_2d\": [62, 600, 711, 751], \"label\": \"Table\"}, {\"bbox_2d\": [61, 987, 281, 998], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [69, 45, 549, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [60, 103, 915, 130], \"label\": \"Text\"}, {\"bbox_2d\": [60, 380, 912, 421], \"label\": \"Text\"}, {\"bbox_2d\": [60, 434, 921, 501], \"label\": \"Text\"}, {\"bbox_2d\": [60, 514, 900, 582], \"label\": \"Text\"}, {\"bbox_2d\": [59, 138, 917, 358], \"label\": \"Table\"}, {\"bbox_2d\": [60, 358, 385, 369], \"label\": \"Text\"}, {\"bbox_2d\": [62, 598, 711, 752], \"label\": \"Table\"}, {\"bbox_2d\": [60, 987, 282, 997], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [564, 68, 940, 351], \"label\": \"Picture\"}, {\"bbox_2d\": [90, 68, 530, 351], \"label\": \"Picture\"}, {\"bbox_2d\": [627, 334, 657, 351], \"label\": \"Caption\"}, {\"bbox_2d\": [62, 377, 951, 444], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 487, 856, 500], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 521, 951, 609], \"label\": \"Text\"}, {\"bbox_2d\": [91, 488, 506, 608], \"label\": \"Text\"}, {\"bbox_2d\": [91, 590, 466, 615], \"label\": \"Text\"}, {\"bbox_2d\": [91, 681, 506, 713], \"label\": \"Text\"}, {\"bbox_2d\": [91, 747, 506, 794], \"label\": \"Text\"}, {\"bbox_2d\": [91, 811, 506, 899], \"label\": \"Text\"}, {\"bbox_2d\": [91, 904, 506, 949], \"label\": \"Text\"}, {\"bbox_2d\": [536, 626, 951, 659], \"label\": \"Text\"}, {\"bbox_2d\": [536, 717, 951, 743], \"label\": \"Text\"}, {\"bbox_2d\": [536, 799, 951, 819], \"label\": \"Text\"}, {\"bbox_2d\": [536, 832, 936, 853], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 858, 936, 880], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 886, 951, 936], \"label\": \"Text\"}, {\"bbox_2d\": [170, 657, 506, 680], \"label\": \"Formula\"}, {\"bbox_2d\": [175, 721, 506, 737], \"label\": \"Formula\"}, {\"bbox_2d\": [553, 731, 940, 792], \"label\": \"Formula\"}, {\"bbox_2d\": [124, 769, 470, 824], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [93, 66, 942, 350], \"label\": \"Picture\"}, {\"bbox_2d\": [934, 38, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 488, 506, 605], \"label\": \"Text\"}, {\"bbox_2d\": [536, 522, 951, 609], \"label\": \"Text\"}, {\"bbox_2d\": [91, 378, 951, 443], \"label\": \"Caption\"}, {\"bbox_2d\": [630, 489, 857, 500], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 627, 951, 654], \"label\": \"Text\"}, {\"bbox_2d\": [91, 618, 506, 645], \"label\": \"Text\"}, {\"bbox_2d\": [91, 683, 506, 710], \"label\": \"Text\"}, {\"bbox_2d\": [91, 747, 506, 759], \"label\": \"Text\"}, {\"bbox_2d\": [536, 804, 947, 816], \"label\": \"Text\"}, {\"bbox_2d\": [91, 814, 506, 901], \"label\": \"Text\"}, {\"bbox_2d\": [91, 904, 506, 946], \"label\": \"Text\"}, {\"bbox_2d\": [536, 919, 951, 946], \"label\": \"Text\"}, {\"bbox_2d\": [550, 833, 937, 902], \"label\": \"Formula\"}, {\"bbox_2d\": [554, 757, 933, 779], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 669, 967, 700], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 657, 425, 670], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 722, 425, 735], \"label\": \"Formula\"}, {\"bbox_2d\": [127, 774, 469, 803], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 715, 949, 742], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 97, 331, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 141, 911, 347], \"label\": \"Text\"}, {\"bbox_2d\": [121, 361, 911, 704], \"label\": \"Text\"}, {\"bbox_2d\": [121, 717, 911, 868], \"label\": \"Text\"}, {\"bbox_2d\": [121, 881, 911, 950], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 97, 331, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 141, 911, 348], \"label\": \"Text\"}, {\"bbox_2d\": [121, 360, 911, 704], \"label\": \"Text\"}, {\"bbox_2d\": [121, 716, 911, 868], \"label\": \"Text\"}, {\"bbox_2d\": [121, 880, 911, 950], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 49, 116, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 74, 70, 87], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 169, 76, 182], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 300, 81, 313], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 395, 81, 408], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 456, 116, 468], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 492, 159, 501], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 514, 968, 537], \"label\": \"Section-header\"}, {\"bbox_2d\": [65, 98, 209, 109], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 111, 310, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 123, 185, 133], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 135, 238, 145], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 147, 275, 158], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 192, 886, 203], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 205, 272, 215], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 217, 702, 227], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 229, 522, 240], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 242, 426, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 254, 820, 264], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 266, 289, 276], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 278, 247, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 324, 498, 334], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 336, 289, 346], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 348, 794, 358], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 360, 629, 370], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 372, 387, 382], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 420, 391, 442], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 465, 798, 477], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [25, 50, 116, 63], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 75, 70, 86], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 170, 76, 181], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 301, 81, 312], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 396, 80, 407], \"label\": \"Section-header\"}, {\"bbox_2d\": [65, 99, 208, 109], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 111, 311, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 123, 185, 134], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 135, 238, 146], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 147, 276, 158], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 194, 886, 204], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 206, 274, 216], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 218, 701, 229], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 230, 522, 241], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 242, 425, 253], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 254, 708, 265], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 266, 290, 277], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 278, 246, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 325, 499, 336], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 337, 288, 348], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 349, 794, 360], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 361, 630, 372], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 373, 386, 384], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 420, 392, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [65, 432, 262, 443], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 514, 968, 537], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 455, 118, 466], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 467, 799, 478], \"label\": \"Section-header\"}, {\"bbox_2d\": [65, 491, 158, 502], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [884, 974, 895, 986], \"label\": \"Page-footer\"}, {\"bbox_2d\": [736, 62, 902, 72], \"label\": \"Page-header\"}, {\"bbox_2d\": [460, 74, 624, 85], \"label\": \"Section-header\"}, {\"bbox_2d\": [180, 134, 867, 164], \"label\": \"Table\"}, {\"bbox_2d\": [109, 33, 292, 125], \"label\": \"Picture\"}, {\"bbox_2d\": [828, 84, 903, 122], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [458, 72, 627, 86], \"label\": \"Text\"}, {\"bbox_2d\": [735, 60, 905, 71], \"label\": \"Page-header\"}, {\"bbox_2d\": [112, 34, 291, 122], \"label\": \"Picture\"}, {\"bbox_2d\": [829, 82, 901, 120], \"label\": \"Picture\"}, {\"bbox_2d\": [179, 130, 872, 166], \"label\": \"Table\"}, {\"bbox_2d\": [886, 973, 893, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 164], \"label\": \"Text\"}, {\"bbox_2d\": [121, 213, 892, 229], \"label\": \"Text\"}, {\"bbox_2d\": [121, 239, 911, 360], \"label\": \"Text\"}, {\"bbox_2d\": [121, 438, 911, 585], \"label\": \"Text\"}, {\"bbox_2d\": [121, 705, 911, 799], \"label\": \"Text\"}, {\"bbox_2d\": [121, 852, 911, 920], \"label\": \"Text\"}, {\"bbox_2d\": [121, 838, 320, 854], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 970, 527, 984], \"label\": \"Page-footer\"}, {\"bbox_2d\": [288, 610, 911, 684], \"label\": \"Formula\"}, {\"bbox_2d\": [290, 384, 911, 420], \"label\": \"Formula\"}, {\"bbox_2d\": [317, 176, 523, 191], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 914, 161], \"label\": \"Text\"}, {\"bbox_2d\": [121, 215, 891, 227], \"label\": \"Text\"}, {\"bbox_2d\": [121, 240, 914, 357], \"label\": \"Text\"}, {\"bbox_2d\": [121, 440, 914, 583], \"label\": \"Text\"}, {\"bbox_2d\": [121, 705, 914, 795], \"label\": \"Text\"}, {\"bbox_2d\": [121, 881, 914, 919], \"label\": \"Text\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [291, 610, 914, 677], \"label\": \"Formula\"}, {\"bbox_2d\": [319, 384, 914, 413], \"label\": \"Formula\"}, {\"bbox_2d\": [401, 175, 634, 198], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 838, 321, 855], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 69, 918, 103], \"label\": \"Title\"}, {\"bbox_2d\": [483, 119, 559, 133], \"label\": \"Text\"}, {\"bbox_2d\": [200, 139, 844, 152], \"label\": \"Text\"}, {\"bbox_2d\": [156, 161, 859, 323], \"label\": \"Text\"}, {\"bbox_2d\": [184, 338, 408, 350], \"label\": \"Text\"}, {\"bbox_2d\": [536, 377, 927, 390], \"label\": \"Text\"}, {\"bbox_2d\": [108, 414, 239, 427], \"label\": \"Section-header\"}, {\"bbox_2d\": [264, 378, 333, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 445, 478, 473], \"label\": \"Section-header\"}, {\"bbox_2d\": [89, 491, 406, 504], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 521, 474, 550], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 556, 482, 583], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 589, 482, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 633, 236, 646], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 663, 278, 676], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 696, 418, 723], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 742, 482, 783], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 802, 220, 815], \"label\": \"Section-header\"}, {\"bbox_2d\": [206, 856, 390, 869], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 890, 506, 950], \"label\": \"Text\"}, {\"bbox_2d\": [536, 754, 927, 931], \"label\": \"Text\"}, {\"bbox_2d\": [536, 937, 927, 950], \"label\": \"Text\"}, {\"bbox_2d\": [536, 501, 927, 727], \"label\": \"Text\"}, {\"bbox_2d\": [545, 404, 927, 492], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 216, 173, 233], \"label\": \"Page-header\"}, {\"bbox_2d\": [31, 217, 62, 756], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [125, 70, 917, 102], \"label\": \"Title\"}, {\"bbox_2d\": [184, 162, 858, 324], \"label\": \"Text\"}, {\"bbox_2d\": [198, 140, 844, 151], \"label\": \"Text\"}, {\"bbox_2d\": [31, 220, 61, 753], \"label\": \"Page-header\"}, {\"bbox_2d\": [184, 340, 407, 350], \"label\": \"Text\"}, {\"bbox_2d\": [208, 856, 389, 868], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 891, 506, 949], \"label\": \"Text\"}, {\"bbox_2d\": [553, 935, 951, 949], \"label\": \"Text\"}, {\"bbox_2d\": [536, 756, 951, 934], \"label\": \"Text\"}, {\"bbox_2d\": [536, 499, 951, 754], \"label\": \"Text\"}, {\"bbox_2d\": [536, 379, 945, 392], \"label\": \"Text\"}, {\"bbox_2d\": [545, 404, 951, 488], \"label\": \"Formula\"}, {\"bbox_2d\": [263, 380, 333, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [482, 119, 559, 133], \"label\": \"Text\"}, {\"bbox_2d\": [107, 414, 506, 427], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 445, 506, 473], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 492, 506, 505], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 523, 506, 551], \"label\": \"List-item\"}, {\"bbox_2d\": [127, 555, 506, 583], \"label\": \"List-item\"}, {\"bbox_2d\": [128, 586, 506, 614], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 633, 506, 646], \"label\": \"List-item\"}, {\"bbox_2d\": [127, 664, 506, 677], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 695, 506, 723], \"label\": \"List-item\"}, {\"bbox_2d\": [100, 742, 506, 785], \"label\": \"List-item\"}, {\"bbox_2d\": [127, 803, 514, 816], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [44, 999, 169, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [73, 101, 233, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 422, 179, 433], \"label\": \"Section-header\"}, {\"bbox_2d\": [47, 16, 288, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [73, 332, 915, 360], \"label\": \"Text\"}, {\"bbox_2d\": [111, 360, 463, 371], \"label\": \"Text\"}, {\"bbox_2d\": [111, 373, 872, 384], \"label\": \"Text\"}, {\"bbox_2d\": [73, 662, 915, 693], \"label\": \"Text\"}, {\"bbox_2d\": [111, 694, 465, 704], \"label\": \"Text\"}, {\"bbox_2d\": [73, 441, 916, 658], \"label\": \"Table\"}, {\"bbox_2d\": [71, 118, 916, 331], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 20, 289, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [73, 102, 234, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 423, 179, 433], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 331, 915, 360], \"label\": \"Text\"}, {\"bbox_2d\": [111, 363, 465, 371], \"label\": \"Text\"}, {\"bbox_2d\": [111, 373, 875, 392], \"label\": \"Text\"}, {\"bbox_2d\": [72, 665, 915, 694], \"label\": \"Text\"}, {\"bbox_2d\": [111, 696, 465, 705], \"label\": \"Text\"}, {\"bbox_2d\": [71, 120, 915, 324], \"label\": \"Table\"}, {\"bbox_2d\": [71, 441, 915, 658], \"label\": \"Table\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [54, 53, 986, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [538, 179, 693, 189], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 344, 406, 375], \"label\": \"Section-header\"}, {\"bbox_2d\": [54, 648, 261, 677], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 914, 474, 944], \"label\": \"Section-header\"}, {\"bbox_2d\": [538, 900, 740, 976], \"label\": \"Text\"}, {\"bbox_2d\": [538, 746, 739, 888], \"label\": \"Text\"}, {\"bbox_2d\": [538, 598, 739, 737], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [768, 524, 968, 665], \"label\": \"Text\"}, {\"bbox_2d\": [768, 676, 967, 785], \"label\": \"Text\"}, {\"bbox_2d\": [768, 797, 967, 896], \"label\": \"Text\"}, {\"bbox_2d\": [538, 381, 584, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 381, 501, 503], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [294, 381, 501, 391], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 586], \"label\": \"Text\"}, {\"bbox_2d\": [538, 524, 729, 58", "label": "```json[{\"bbox_2d\": [53, 57, 987, 86], \"label\": \"Section-header\"}, {\"bbox_2d\": [53, 170, 263, 292], \"label\": \"Text\"}, {\"bbox_2d\": [53, 306, 249, 412], \"label\": \"Text\"}, {\"bbox_2d\": [53, 425, 258, 516], \"label\": \"Text\"}, {\"bbox_2d\": [53, 529, 258, 635], \"label\": \"Text\"}, {\"bbox_2d\": [53, 682, 265, 788], \"label\": \"Text\"}, {\"bbox_2d\": [53, 801, 257, 987], \"label\": \"Text\"}, {\"bbox_2d\": [292, 946, 488, 989], \"label\": \"Text\"}, {\"bbox_2d\": [292, 650, 504, 900], \"label\": \"Text\"}, {\"bbox_2d\": [292, 514, 502, 636], \"label\": \"Text\"}, {\"bbox_2d\": [292, 379, 502, 501], \"label\": \"Text\"}, {\"bbox_2d\": [292, 226, 498, 332], \"label\": \"Text\"}, {\"bbox_2d\": [292, 170, 504, 213], \"label\": \"Text\"}, {\"bbox_2d\": [530, 168, 986, 510], \"label\": \"Picture\"}, {\"bbox_2d\": [292, 346, 485, 373], \"label\": \"Section-header\"}, {\"bbox_2d\": [53, 649, 261, 676], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 914, 472, 941], \"label\": \"Section-header\"}, {\"bbox_2d\": [31, 1003, 222, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [530, 525, 725, 583], \"label\": \"Text\"}, {\"bbox_2d\": [530, 597, 735, 735], \"label\": \"Text\"}, {\"bbox_2d\": [530, 748, 738, 886], \"label\": \"Text\"}, {\"bbox_2d\": [530, 900, 739, 974], \"label\": \"Text\"}, {\"bbox_2d\": [769, 796, 980, 934], \"label\": \"Text\"}, {\"bbox_2d\": [769, 676, 981, 783], \"label\": \"Text\"}, {\"bbox_2d\": [769, 525, 970, 663], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [62, 46, 946, 74], \"label\": \"Text\"}, {\"bbox_2d\": [62, 208, 507, 220], \"label\": \"Text\"}, {\"bbox_2d\": [62, 416, 971, 522], \"label\": \"Text\"}, {\"bbox_2d\": [62, 540, 955, 616], \"label\": \"Text\"}, {\"bbox_2d\": [62, 635, 972, 728], \"label\": \"Text\"}, {\"bbox_2d\": [62, 309, 952, 331], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 286, 934, 299], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 247, 946, 268], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 351, 465, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 382, 341, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [77, 318, 952, 330], \"label\": \"List-item\"}, {\"bbox_2d\": [62, 176, 183, 189], \"label\": \"Section-header\"}, {\"bbox_2d\": [89, 249, 109, 261], \"label\": \"Picture\"}, {\"bbox_2d\": [278, 91, 761, 136], \"label\": \"Picture\"}, {\"bbox_2d\": [92, 286, 109, 298], \"label\": \"Picture\"}, {\"bbox_2d\": [92, 351, 109, 362], \"label\": \"Picture\"}, {\"bbox_2d\": [997, 350, 1008, 687], \"label\": \"Page-header\"}, {\"bbox_2d\": [508, 998, 527, 1008], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [508, 993, 527, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [994, 349, 1013, 686], \"label\": \"Page-header\"}, {\"bbox_2d\": [60, 43, 951, 76], \"label\": \"Text\"}, {\"bbox_2d\": [60, 204, 508, 222], \"label\": \"Text\"}, {\"bbox_2d\": [60, 412, 976, 524], \"label\": \"Text\"}, {\"bbox_2d\": [60, 538, 961, 618], \"label\": \"Text\"}, {\"bbox_2d\": [60, 632, 979, 728], \"label\": \"Text\"}, {\"bbox_2d\": [60, 173, 183, 191], \"label\": \"Section-header\"}, {\"bbox_2d\": [266, 87, 765, 140], \"label\": \"Picture\"}, {\"bbox_2d\": [67, 234, 954, 397], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [128, 18, 261, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [73, 101, 434, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 121, 275, 131], \"label\": \"Section-header\"}, {\"bbox_2d\": [71, 152, 916, 900], \"label\": \"Table\"}, {\"bbox_2d\": [72, 903, 387, 911], \"label\": \"Text\"}, {\"bbox_2d\": [72, 911, 895, 931], \"label\": \"Text\"}, {\"bbox_2d\": [45, 999, 170, 1015], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [37, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 20, 261, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 154, 917, 447], \"label\": \"Table\"}, {\"bbox_2d\": [71, 486, 917, 744], \"label\": \"Table\"}, {\"bbox_2d\": [72, 461, 227, 483], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 759, 138, 768], \"label\": \"Section-header\"}, {\"bbox_2d\": [70, 771, 915, 900], \"label\": \"Table\"}, {\"bbox_2d\": [72, 903, 387, 912], \"label\": \"Text\"}, {\"bbox_2d\": [111, 914, 898, 933], \"label\": \"Text\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [72, 102, 435, 129], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [185, 89, 630, 103], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 107, 883, 152], \"label\": \"Text\"}, {\"bbox_2d\": [185, 159, 532, 172], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 179, 634, 192], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 200, 883, 226], \"label\": \"Text\"}, {\"bbox_2d\": [185, 234, 677, 247], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 254, 883, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [227, 288, 567, 301], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 309, 624, 322], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 329, 883, 355], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 396, 434, 409], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 417, 308, 429], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 437, 277, 450], \"label\": \"List-item\"}, {\"bbox_2d\": [227, 457, 883, 513], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 521, 880, 548], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 556, 715, 568], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 636, 811, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 655, 756, 667], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 674, 883, 715], \"label\": \"Text\"}, {\"bbox_2d\": [172, 723, 817, 736], \"label\": \"Text\"}, {\"bbox_2d\": [156, 743, 883, 770], \"label\": \"Text\"}, {\"bbox_2d\": [156, 792, 577, 805], \"label\": \"Section-header\"}, {\"bbox_2d\": [172, 817, 464, 829], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 837, 469, 849], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 857, 655, 869], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 877, 883, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 912, 607, 924], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 589, 265, 602], \"label\": \"Section-header\"}, {\"bbox_2d\": [172, 614, 478, 627], \"label\": \"Text\"}, {\"bbox_2d\": [156, 363, 883, 390], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [186, 91, 636, 103], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 160, 536, 172], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 180, 638, 192], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 234, 682, 246], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 254, 884, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 289, 571, 301], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 309, 627, 320], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 328, 884, 355], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 417, 312, 429], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 437, 281, 449], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 457, 884, 513], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 521, 884, 548], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 556, 720, 568], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 635, 816, 647], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 655, 760, 666], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 838, 474, 849], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 857, 658, 869], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 877, 884, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 912, 612, 924], \"label\": \"List-item\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [156, 111, 884, 152], \"label\": \"Text\"}, {\"bbox_2d\": [156, 200, 884, 226], \"label\": \"Text\"}, {\"bbox_2d\": [156, 363, 884, 390], \"label\": \"Text\"}, {\"bbox_2d\": [171, 398, 437, 409], \"label\": \"Text\"}, {\"bbox_2d\": [171, 615, 482, 627], \"label\": \"Text\"}, {\"bbox_2d\": [156, 674, 884, 716], \"label\": \"Text\"}, {\"bbox_2d\": [171, 724, 822, 736], \"label\": \"Text\"}, {\"bbox_2d\": [156, 744, 884, 770], \"label\": \"Text\"}, {\"bbox_2d\": [171, 818, 467, 830], \"label\": \"Text\"}, {\"bbox_2d\": [156, 793, 581, 805], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 591, 269, 602], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [72, 102, 477, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 177, 260, 191], \"label\": \"Section-header\"}, {\"bbox_2d\": [657, 109, 763, 119], \"label\": \"Text\"}, {\"bbox_2d\": [657, 122, 913, 132], \"label\": \"Text\"}, {\"bbox_2d\": [72, 208, 221, 239], \"label\": \"Text\"}, {\"bbox_2d\": [72, 249, 189, 260], \"label\": \"Text\"}, {\"bbox_2d\": [72, 269, 215, 280], \"label\": \"Text\"}, {\"bbox_2d\": [72, 290, 122, 301], \"label\": \"Text\"}, {\"bbox_2d\": [72, 332, 185, 343], \"label\": \"Text\"}, {\"bbox_2d\": [72, 352, 320, 374], \"label\": \"Text\"}, {\"bbox_2d\": [72, 384, 407, 418], \"label\": \"Text\"}, {\"bbox_2d\": [72, 427, 251, 451], \"label\": \"Text\"}, {\"bbox_2d\": [72, 461, 233, 472], \"label\": \"Text\"}, {\"bbox_2d\": [72, 497, 460, 543], \"label\": \"Text\"}, {\"bbox_2d\": [72, 568, 239, 580], \"label\": \"Text\"}, {\"bbox_2d\": [72, 589, 270, 599], \"label\": \"Text\"}, {\"bbox_2d\": [72, 614, 470, 648], \"label\": \"Text\"}, {\"bbox_2d\": [72, 658, 220, 669], \"label\": \"Text\"}, {\"bbox_2d\": [72, 678, 437, 692], \"label\": \"Text\"}, {\"bbox_2d\": [72, 701, 367, 725], \"label\": \"Text\"}, {\"bbox_2d\": [72, 758, 442, 799], \"label\": \"Text\"}, {\"bbox_2d\": [72, 824, 216, 834], \"label\": \"Text\"}, {\"bbox_2d\": [72, 849, 244, 861], \"label\": \"Text\"}, {\"bbox_2d\": [72, 871, 208, 883], \"label\": \"Text\"}, {\"bbox_2d\": [72, 926, 173, 956], \"label\": \"Text\"}, {\"bbox_2d\": [72, 962, 214, 988], \"label\": \"Text\"}, {\"bbox_2d\": [42, 1001, 169, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [518, 926, 908, 936], \"label\": \"Text\"}, {\"bbox_2d\": [518, 939, 645, 956], \"label\": \"Text\"}, {\"bbox_2d\": [518, 961, 908, 980], \"label\": \"Text\"}, {\"bbox_2d\": [518, 871, 900, 905], \"label\": \"Text\"}, {\"bbox_2d\": [518, 799, 902, 848], \"label\": \"Text\"}, {\"bbox_2d\": [518, 764, 902, 797], \"label\": \"Text\"}, {\"bbox_2d\": [518, 712, 874, 723], \"label\": \"Text\"}, {\"bbox_2d\": [518, 685, 669, 701], \"label\": \"Text\"}, {\"bbox_2d\": [518, 652, 853, 675], \"label\": \"Text\"}, {\"bbox_2d\": [518, 606, 900, 649], \"label\": \"Text\"}, {\"bbox_2d\": [518, 541, 898, 603], \"label\": \"Text\"}, {\"bbox_2d\": [518, 484, 732, 531], \"label\": \"Text\"}, {\"bbox_2d\": [518, 433, 701, 457], \"label\": \"Text\"}, {\"bbox_2d\": [518, 367, 725, 390], \"label\": \"Text\"}, {\"bbox_2d\": [518, 346, 637, 357], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 275, 641, 290], \"label\": \"Text\"}, {\"bbox_2d\": [518, 245, 671, 268], \"label\": \"Text\"}, {\"bbox_2d\": [518, 212, 671, 235], \"label\": \"Text\"}, {\"bbox_2d\": [518, 191, 672, 203], \"label\": \"Text\"}, {\"bbox_2d\": [72, 391, 407, 418], \"label\": \"Text\"}, {\"bbox_2d\": [72, 427, 197, 437], \"label\": \"Text\"}, {\"bbox_2d\": [72, 457, 233, 472], \"label\": \"Text\"}, {\"bbox_2d\": [72, 497, 347, 511], \"label\": \"Text\"}, {\"bbox_2d\": [72, 536, 197, 546], \"label\": \"Text\"}, {\"bbox_2d\": [72, 568, 239, 580], \"label\": \"Text\"}, {\"bbox_2d\": [72, 589, 358, 603], \"label\": \"Text\"}, {\"bbox_2d\": [72, 627, 469, 648], \"label\": \"Text\"}, {\"bbox_2d\": [72, 669, 319, 692], \"label\": \"Text\"}, {\"bbox_2d\": [72, 701, 437, 718], \"label\": \"Text\"}, {\"bbox_2d\": [72, 758, 182, 767], \"label\": \"Text\"}, {\"bbox_2d\": [72, 788, 442, 799], \"label\": \"Text\"}, {\"bbox_2d\": [72, 824, 216, 834], \"label\": \"Text\"}, {\"bbox_2d\": [72, 849, 244, 861], \"label\": \"Text\"}, {\"bbox_2d\": [518, 860, 679, 871], \"label\": \"Text\"}, {\"bbox_2d\": [72, 871, 208, 883], \"label\": \"Text\"}, {\"bbox_2d\": [72, 926, 173, 956], \"label\": \"Text\"}, {\"bbox_2d\": [518, 911, 645, 922], \"label\": \"Text\"}, {\"bbox_2d\": [72, 962, 214, 988], \"label\": \"", "label": "```json[{\"bbox_2d\": [517, 367, 675, 379], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 348, 636, 357], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 466, 669, 478], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 432, 700, 445], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 509, 692, 521], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 542, 661, 554], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 609, 643, 621], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 652, 636, 665], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 685, 670, 698], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 718, 623, 731], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 762, 638, 774], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 805, 695, 817], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 859, 679, 871], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 913, 644, 925], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 946, 645, 958], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 381, 725, 390], \"label\": \"Text\"}, {\"bbox_2d\": [517, 414, 641, 423], \"label\": \"Text\"}, {\"bbox_2d\": [517, 447, 696, 456], \"label\": \"Text\"}, {\"bbox_2d\": [517, 480, 732, 499], \"label\": \"Text\"}, {\"bbox_2d\": [517, 399, 695, 412], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 524, 796, 532], \"label\": \"Text\"}, {\"bbox_2d\": [517, 557, 900, 599], \"label\": \"Text\"}, {\"bbox_2d\": [517, 624, 897, 643], \"label\": \"Text\"}, {\"bbox_2d\": [517, 667, 855, 676], \"label\": \"Text\"}, {\"bbox_2d\": [517, 700, 692, 709], \"label\": \"Text\"}, {\"bbox_2d\": [517, 733, 878, 752], \"label\": \"Text\"}, {\"bbox_2d\": [517, 776, 906, 795], \"label\": \"Text\"}, {\"bbox_2d\": [517, 820, 901, 849], \"label\": \"Text\"}, {\"bbox_2d\": [517, 874, 907, 903], \"label\": \"Text\"}, {\"bbox_2d\": [517, 927, 908, 936], \"label\": \"Text\"}, {\"bbox_2d\": [517, 960, 912, 979], \"label\": \"Text\"}, {\"bbox_2d\": [517, 193, 672, 202], \"label\": \"Text\"}, {\"bbox_2d\": [517, 226, 672, 235], \"label\": \"Text\"}, {\"bbox_2d\": [517, 259, 672, 268], \"label\": \"Text\"}, {\"bbox_2d\": [517, 292, 620, 301], \"label\": \"Text\"}, {\"bbox_2d\": [72, 366, 320, 375], \"label\": \"Text\"}, {\"bbox_2d\": [529, 320, 696, 331], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 278, 641, 290], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 245, 608, 257], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 212, 594, 224], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 179, 627, 191], \"label\": \"Section-header\"}, {\"bbox_2d\": [657, 111, 914, 130], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 105, 482, 120], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 153, 739, 165], \"label\": \"Section-header\"}, {\"bbox_2d\": [84, 181, 259, 191], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 208, 221, 240], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 291, 187, 322], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 333, 192, 364], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 250, 214, 281], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 384, 217, 397], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 427, 247, 440], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 481, 211, 493], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 399, 411, 418], \"label\": \"Text\"}, {\"bbox_2d\": [72, 443, 256, 451], \"label\": \"Text\"}, {\"bbox_2d\": [72, 462, 233, 472], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 524, 256, 537], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 496, 464, 515], \"label\": \"Text\"}, {\"bbox_2d\": [72, 539, 279, 560], \"label\": \"Text\"}, {\"bbox_2d\": [72, 584, 268, 592], \"label\": \"Text\"}, {\"bbox_2d\": [72, 617, 473, 648], \"label\": \"Text\"}, {\"bbox_2d\": [72, 569, 236, 582], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 602, 222, 614], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 672, 437, 692], \"label\": \"Text\"}, {\"bbox_2d\": [72, 717, 366, 725], \"label\": \"Text\"}, {\"bbox_2d\": [72, 657, 216, 670], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 701, 271, 714], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 736, 184, 746], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 755, 177, 768], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 810, 182, 820], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 829, 216, 842], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 851, 244, 863], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 872, 209, 884], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 770, 444, 799], \"label\": \"Text\"}, {\"bbox_2d\": [72, 944, 174, 952], \"label\": \"Text\"}, {\"bbox_2d\": [72, 977, 174, 985], \"label\": \"Text\"}, {\"bbox_2d\": [43, 999, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [84, 902, 264, 913], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 929, 167, 941], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 962, 215, 974], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 72, 400, 88], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 406, 945, 579], \"label\": \"Text\"}, {\"bbox_2d\": [91, 594, 945, 704], \"label\": \"Text\"}, {\"bbox_2d\": [91, 719, 945, 830], \"label\": \"Text\"}, {\"bbox_2d\": [91, 844, 945, 954], \"label\": \"Text\"}, {\"bbox_2d\": [507, 988, 526, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 90, 847, 373], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [91, 74, 398, 88], \"label\": \"Caption\"}, {\"bbox_2d\": [93, 92, 847, 371], \"label\": \"Text\"}, {\"bbox_2d\": [91, 408, 944, 578], \"label\": \"Text\"}, {\"bbox_2d\": [91, 594, 944, 703], \"label\": \"Text\"}, {\"bbox_2d\": [91, 720, 944, 828], \"label\": \"Text\"}, {\"bbox_2d\": [91, 845, 944, 953], \"label\": \"Text\"}, {\"bbox_2d\": [508, 993, 527, 1007], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 155, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [219, 38, 814, 47], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 79, 1009, 138], \"label\": \"Text\"}, {\"bbox_2d\": [25, 180, 1009, 272], \"label\": \"Text\"}, {\"bbox_2d\": [25, 282, 1009, 395], \"label\": \"Text\"}, {\"bbox_2d\": [25, 405, 1009, 457], \"label\": \"Text\"}, {\"bbox_2d\": [25, 487, 1009, 532], \"label\": \"Text\"}, {\"bbox_2d\": [43, 472, 220, 482], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 546, 204, 556], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 563, 1009, 619], \"label\": \"Text\"}, {\"bbox_2d\": [504, 643, 529, 653], \"label\": \"Page-footer\"}, {\"bbox_2d\": [43, 150, 310, 176], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 155, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [219, 37, 815, 48], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 150, 311, 160], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 165, 175, 176], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 472, 218, 482], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 545, 203, 556], \"label\": \"Section-header\"}, {\"bbox_2d\": [504, 640, 529, 651], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 561, 1009, 617], \"label\": \"Text\"}, {\"bbox_2d\": [25, 487, 1009, 532], \"label\": \"Text\"}, {\"bbox_2d\": [25, 402, 1009, 458], \"label\": \"Text\"}, {\"bbox_2d\": [25, 280, 1009, 393], \"label\": \"Text\"}, {\"bbox_2d\": [25, 181, 1009, 271], \"label\": \"Text\"}, {\"bbox_2d\": [25, 80, 1009, 136], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [64, 43, 352, 88], \"label\": \"Page-header\"}, {\"bbox_2d\": [62, 104, 480, 585], \"label\": \"Text\"}, {\"bbox_2d\": [62, 590, 480, 774], \"label\": \"Text\"}, {\"bbox_2d\": [62, 779, 481, 960], \"label\": \"Text\"}, {\"bbox_2d\": [517, 590, 931, 761], \"label\": \"Text\"}, {\"bbox_2d\": [517, 767, 933, 961], \"label\": \"Text\"}, {\"bbox_2d\": [482, 974, 512, 986], \"label\": \"Page-footer\"}, {\"bbox_2d\": [517, 104, 933, 586], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [65, 42, 362, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [62, 101, 492, 587], \"label\": \"Text\"}, {\"bbox_2d\": [516, 101, 946, 587], \"label\": \"Text\"}, {\"bbox_2d\": [63, 588, 489, 774], \"label\": \"Text\"}, {\"bbox_2d\": [516, 588, 946, 762], \"label\": \"Text\"}, {\"bbox_2d\": [517, 764, 939, 961], \"label\": \"Text\"}, {\"bbox_2d\": [63, 776, 490, 961], \"label\": \"Text\"}, {\"bbox_2d\": [483, 971, 525, 986], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [155, 128, 880, 268], \"label\": \"Text\"}, {\"bbox_2d\": [155, 369, 880, 417], \"label\": \"Text\"}, {\"bbox_2d\": [155, 444, 880, 519], \"label\": \"Text\"}, {\"bbox_2d\": [155, 735, 880, 782], \"label\": \"Text\"}, {\"bbox_2d\": [155, 808, 880, 851], \"label\": \"Text\"}, {\"bbox_2d\": [508, 877, 527, 891], \"label\": \"Page-footer\"}, {\"bbox_2d\": [155, 662, 880, 709], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 588, 880, 635], \"label\": \"List-item\"}, {\"bbox_2d\": [179, 547, 515, 560], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 320, 786, 339], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [155, 319, 785, 339], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 128, 880, 268], \"label\": \"Text\"}, {\"bbox_2d\": [155, 370, 880, 416], \"label\": \"Text\"}, {\"bbox_2d\": [155, 443, 880, 519], \"label\": \"Text\"}, {\"bbox_2d\": [178, 546, 515, 560], \"label\": \"List-item\"}, {\"bbox_2d\": [178, 588, 879, 634], \"label\": \"List-item\"}, {\"bbox_2d\": [178, 659, 880, 708], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 735, 880, 780], \"label\": \"Text\"}, {\"bbox_2d\": [155, 807, 880, 852], \"label\": \"Text\"}, {\"bbox_2d\": [508, 877, 527, 891], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [152, 71, 965, 95], \"label\": \"Text\"}, {\"bbox_2d\": [152, 110, 965, 162], \"label\": \"Text\"}, {\"bbox_2d\": [152, 176, 965, 228], \"label\": \"Text\"}, {\"bbox_2d\": [152, 242, 965, 294], \"label\": \"Text\"}, {\"bbox_2d\": [152, 307, 965, 386], \"label\": \"Text\"}, {\"bbox_2d\": [152, 400, 965, 494], \"label\": \"Text\"}, {\"bbox_2d\": [152, 544, 965, 596], \"label\": \"Text\"}, {\"bbox_2d\": [152, 610, 965, 634], \"label\": \"Text\"}, {\"bbox_2d\": [152, 685, 619, 695], \"label\": \"Text\"}, {\"bbox_2d\": [152, 829, 965, 852], \"label\": \"Text\"}, {\"bbox_2d\": [121, 514, 254, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 656, 247, 668], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 705, 964, 813], \"label\": \"Table\"}, {\"bbox_2d\": [52, 983, 64, 994], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [152, 70, 965, 95], \"label\": \"Text\"}, {\"bbox_2d\": [152, 109, 966, 161], \"label\": \"Text\"}, {\"bbox_2d\": [152, 174, 966, 227], \"label\": \"Text\"}, {\"bbox_2d\": [152, 240, 966, 292], \"label\": \"Text\"}, {\"bbox_2d\": [152, 306, 966, 386], \"label\": \"Text\"}, {\"bbox_2d\": [152, 399, 966, 493], \"label\": \"Text\"}, {\"bbox_2d\": [152, 544, 966, 596], \"label\": \"Text\"}, {\"bbox_2d\": [152, 609, 966, 634], \"label\": \"Text\"}, {\"bbox_2d\": [152, 685, 624, 695], \"label\": \"Text\"}, {\"bbox_2d\": [152, 827, 965, 852], \"label\": \"Text\"}, {\"bbox_2d\": [122, 658, 247, 669], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 515, 259, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [150, 702, 627, 813], \"label\": \"Table\"}, {\"bbox_2d\": [52, 982, 67, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 140, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [36, 61, 337, 72], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 78, 719, 89], \"label\": \"Text\"}, {\"bbox_2d\": [25, 145, 1009, 203], \"label\": \"Text\"}, {\"bbox_2d\": [25, 219, 1009, 253], \"label\": \"Text\"}, {\"bbox_2d\": [25, 258, 1009, 303], \"label\": \"Text\"}, {\"bbox_2d\": [25, 308, 1009, 378], \"label\": \"Text\"}, {\"bbox_2d\": [25, 384, 1009, 405], \"label\": \"Text\"}, {\"bbox_2d\": [25, 506, 1009, 528], \"label\": \"Text\"}, {\"bbox_2d\": [59, 556, 619, 566], \"label\": \"Text\"}, {\"bbox_2d\": [25, 630, 1009, 664], \"label\": \"Text\"}, {\"bbox_2d\": [59, 538, 260, 549], \"label\": \"Section-header\"}, {\"bbox_2d\": [42, 413, 204, 424], \"label\": \"Section-header\"}, {\"bbox_2d\": [59, 430, 537, 441], \"label\": \"Text\"}, {\"bbox_2d\": [24, 453, 1009, 500], \"label\": \"Table\"}, {\"bbox_2d\": [25, 576, 1009, 621], \"label\": \"Table\"}, {\"bbox_2d\": [25, 587, 1009, 638], \"label\": \"Table\"}, {\"bbox_2d\": [508, 693, 525, 704], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 91, 1009, 138], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 140, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [37, 61, 336, 72], \"label\": \"Section-header\"}, {\"bbox_2d\": [61, 78, 719, 89], \"label\": \"Text\"}, {\"bbox_2d\": [22, 91, 1011, 137], \"label\": \"Table\"}, {\"bbox_2d\": [25, 145, 1009, 212], \"label\": \"Text\"}, {\"bbox_2d\": [25, 218, 1009, 252], \"label\": \"Text\"}, {\"bbox_2d\": [25, 258, 1009, 303], \"label\": \"Text\"}, {\"bbox_2d\": [25, 309, 1009, 377], \"label\": \"Text\"}, {\"bbox_2d\": [25, 383, 1009, 405], \"label\": \"Text\"}, {\"bbox_2d\": [58, 431, 537, 442], \"label\": \"Text\"}, {\"bbox_2d\": [43, 414, 204, 425], \"label\": \"Section-header\"}, {\"bbox_2d\": [23, 453, 1011, 499], \"label\": \"Table\"}, {\"bbox_2d\": [25, 506, 1009, 528], \"label\": \"Text\"}, {\"bbox_2d\": [58, 555, 619, 565], \"label\": \"Text\"}, {\"bbox_2d\": [25, 630, 1009, 663], \"label\": \"Text\"}, {\"bbox_2d\": [58, 538, 260, 548], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 576, 1013, 622], \"label\": \"Table\"}, {\"bbox_2d\": [508, 693, 525, 704], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [311, 54, 325, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [464, 54, 596, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [740, 54, 753, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [555, 85, 936, 112], \"label\": \"Text\"}, {\"bbox_2d\": [123, 85, 511, 199], \"label\": \"Text\"}, {\"bbox_2d\": [123, 202, 511, 695], \"label\": \"Text\"}, {\"bbox_2d\": [123, 697, 508, 783], \"label\": \"Text\"}, {\"bbox_2d\": [123, 785, 508, 869], \"label\": \"Text\"}, {\"bbox_2d\": [123, 871, 508, 925], \"label\": \"Text\"}, {\"bbox_2d\": [555, 115, 937, 228], \"label\": \"Text\"}, {\"bbox_2d\": [555, 230, 937, 273], \"label\": \"Text\"}, {\"bbox_2d\": [555, 319, 937, 360], \"label\": \"Text\"}, {\"bbox_2d\": [555, 363, 937, 389], \"label\": \"Text\"}, {\"bbox_2d\": [555, 392, 937, 418], \"label\": \"Text\"}, {\"bbox_2d\": [555, 420, 937, 474], \"label\": \"Text\"}, {\"bbox_2d\": [555, 477, 937, 534], \"label\": \"Text\"}, {\"bbox_2d\": [555, 578, 937, 882], \"label\": \"Text\"}, {\"bbox_2d\": [555, 884, 937, 927], \"label\": \"Text\"}, {\"bbox_2d\": [555, 549, 883, 562], \"label\": \"Section-header\"}, {\"bbox_2d\": [555, 289, 785, 301], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 988, 533, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [309, 55, 322, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [463, 54, 595, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [740, 55, 753, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 84, 523, 199], \"label\": \"Text\"}, {\"bbox_2d\": [123, 200, 523, 693], \"label\": \"Text\"}, {\"bbox_2d\": [123, 694, 521, 780], \"label\": \"Text\"}, {\"bbox_2d\": [123, 781, 519, 867], \"label\": \"Text\"}, {\"bbox_2d\": [123, 868, 506, 925], \"label\": \"Text\"}, {\"bbox_2d\": [554, 883, 932, 925], \"label\": \"Text\"}, {\"bbox_2d\": [554, 84, 930, 112], \"label\": \"Text\"}, {\"bbox_2d\": [554, 113, 960, 228], \"label\": \"Text\"}, {\"bbox_2d\": [554, 230, 937, 272], \"label\": \"Text\"}, {\"bbox_2d\": [554, 317, 942, 359], \"label\": \"Text\"}, {\"bbox_2d\": [554, 360, 952, 417], \"label\": \"Text\"}, {\"bbox_2d\": [554, 418, 942, 475], \"label\": \"Text\"}, {\"bbox_2d\": [554, 476, 941, 533], \"label\": \"Text\"}, {\"bbox_2d\": [554, 578, 953, 881], \"label\": \"Text\"}, {\"bbox_2d\": [554, 549, 884, 562], \"label\": \"Section-header\"}, {\"bbox_2d\": [554, 288, 784, 301], \"label\": \"Section-header\"}, {\"bbox_2d\": [525, 989, 533, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [988, 319, 1013, 700], \"label\": \"Page-header\"}, {\"bbox_2d\": [60, 44, 677, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 73, 638, 91], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 628, 679, 646], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 831, 690, 849], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 736, 330, 756], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 769, 212, 786], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 799, 220, 827], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 853, 150, 870], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 897, 182, 914], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 925, 158, 942], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 954, 346, 983], \"label\": \"Text\"}, {\"bbox_2d\": [528, 966, 771, 982], \"label\": \"Text\"}, {\"bbox_2d\": [528, 862, 797, 891], \"label\": \"Text\"}, {\"bbox_2d\": [60, 966, 346, 983], \"label\": \"Text\"}, {\"bbox_2d\": [575, 584, 976, 618], \"label\": \"Text\"}, {\"bbox_2d\": [528, 660, 741, 688], \"label\": \"Text\"}, {\"bbox_2d\": [528, 697, 688, 715], \"label\": \"Text\"}, {\"bbox_2d\": [528, 731, 666, 760], \"label\": \"Text\"}, {\"bbox_2d\": [528, 786, 625, 803], \"label\": \"Text\"}, {\"bbox_2d\": [528, 857, 797, 891], \"label\": \"Text\"}, {\"bbox_2d\": [528, 847, 690, 856], \"label\": \"Text\"}, {\"bbox_2d\": [60, 88, 502, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [528, 106, 718, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 127, 977, 572], \"label\": \"Picture\"}, {\"bbox_2d\": [528, 931, 795, 950], \"label\": \"Section-header\"}, {\"bbox_2d\": [893, 956, 945, 986], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [60, 41, 676, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [56, 71, 974, 985], \"label\": \"Picture\"}, {\"bbox_2d\": [508, 991, 527, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [995, 321, 1014, 689], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 72, 951, 126], \"label\": \"Text\"}, {\"bbox_2d\": [536, 132, 951, 165], \"label\": \"Text\"}, {\"bbox_2d\": [536, 215, 951, 265], \"label\": \"Text\"}, {\"bbox_2d\": [536, 316, 951, 359], \"label\": \"Text\"}, {\"bbox_2d\": [91, 381, 506, 549], \"label\": \"Text\"}, {\"bbox_2d\": [91, 636, 506, 695], \"label\": \"Text\"}, {\"bbox_2d\": [91, 754, 506, 981], \"label\": \"Text\"}, {\"bbox_2d\": [536, 693, 951, 752], \"label\": \"Text\"}, {\"bbox_2d\": [536, 756, 951, 981], \"label\": \"Text\"}, {\"bbox_2d\": [536, 456, 951, 512], \"label\": \"Text\"}, {\"bbox_2d\": [536, 565, 951, 598], \"label\": \"Text\"}, {\"bbox_2d\": [536, 648, 951, 695], \"label\": \"Text\"}, {\"bbox_2d\": [536, 692, 951, 726], \"label\": \"Text\"}, {\"bbox_2d\": [620, 525, 951, 565], \"label\": \"Formula\"}, {\"bbox_2d\": [568, 598, 951, 644], \"label\": \"Formula\"}, {\"bbox_2d\": [152, 559, 506, 619], \"label\": \"Formula\"}, {\"bbox_2d\": [591, 354, 951, 421], \"label\": \"Formula\"}, {\"bbox_2d\": [593, 313, 951, 351], \"label\": \"Formula\"}, {\"bbox_2d\": [616, 276, 951, 309], \"label\": \"Formula\"}, {\"bbox_2d\": [556, 174, 951, 224], \"label\": \"Formula\"}, {\"bbox_2d\": [164, 68, 427, 238], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 260, 506, 340], \"label\": \"Caption\"}, {\"bbox_2d\": [183, 707, 506, 745], \"label\": \"Formula\"}, {\"bbox_2d\": [180, 754, 506, 981], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [536, 72, 951, 128], \"label\": \"Text\"}, {\"bbox_2d\": [536, 132, 951, 159], \"label\": \"Text\"}, {\"bbox_2d\": [943, 38, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 260, 506, 339], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 382, 515, 548], \"label\": \"Text\"}, {\"bbox_2d\": [91, 624, 516, 694], \"label\": \"Text\"}, {\"bbox_2d\": [91, 752, 510, 946], \"label\": \"Text\"}, {\"bbox_2d\": [536, 785, 952, 946], \"label\": \"Text\"}, {\"bbox_2d\": [536, 695, 952, 781], \"label\": \"Text\"}, {\"bbox_2d\": [536, 649, 951, 691], \"label\": \"Text\"}, {\"bbox_2d\": [536, 567, 951, 594], \"label\": \"Text\"}, {\"bbox_2d\": [536, 457, 951, 514], \"label\": \"Text\"}, {\"bbox_2d\": [536, 314, 951, 359], \"label\": \"Text\"}, {\"bbox_2d\": [536, 217, 951, 261], \"label\": \"Text\"}, {\"bbox_2d\": [583, 171, 951, 210], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 277, 955, 324], \"label\": \"Formula\"}, {\"bbox_2d\": [591, 373, 951, 445], \"label\": \"Formula\"}, {\"bbox_2d\": [623, 525, 951, 575], \"label\": \"Formula\"}, {\"bbox_2d\": [570, 605, 951, 639], \"label\": \"Formula\"}, {\"bbox_2d\": [151, 560, 506, 622], \"label\": \"Formula\"}, {\"bbox_2d\": [182, 709, 506, 741], \"label\": \"Formula\"}, {\"bbox_2d\": [164, 64, 431, 244], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [218, 91, 884, 132], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 141, 883, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 175, 308, 187], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 195, 514, 207], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 215, 883, 241], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 250, 454, 262], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 269, 879, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 289, 883, 316], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 324, 883, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 373, 883, 400], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 408, 606, 420], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 428, 883, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 463, 883, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 497, 883, 524], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 532, 883, 573], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 581, 688, 593], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 599, 883, 626], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 634, 883, 661], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 670, 463, 682], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 690, 883, 717], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 725, 883, 751], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 759, 618, 771], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 779, 883, 806], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 814, 788, 826], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 834, 883, 860], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 898, 883, 931], \"label\": \"List-item\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [218, 91, 884, 132], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 175, 309, 187], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 195, 515, 207], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 215, 884, 241], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 249, 455, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 269, 879, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 289, 884, 315], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 323, 884, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 408, 605, 419], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 427, 884, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 497, 880, 523], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 531, 884, 573], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 581, 687, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 635, 884, 662], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 670, 462, 682], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 690, 884, 716], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 779, 884, 805], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 813, 786, 825], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 833, 884, 860], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 140, 884, 167], \"label\": \"Text\"}, {\"bbox_2d\": [156, 373, 884, 400], \"label\": \"Text\"}, {\"bbox_2d\": [156, 462, 884, 489], \"label\": \"Text\"}, {\"bbox_2d\": [156, 600, 880, 627], \"label\": \"Text\"}, {\"bbox_2d\": [156, 724, 884, 751], \"label\": \"Text\"}, {\"bbox_2d\": [171, 759, 618, 771], \"label\": \"Text\"}, {\"bbox_2d\": [156, 899, 883, 930], \"label\": \"Footnote\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [204, 169, 401, 182], \"label\": \"Section-header\"}, {\"bbox_2d\": [204, 194, 495, 207], \"label\": \"Text\"}, {\"bbox_2d\": [204, 210, 880, 241], \"label\": \"Text\"}, {\"bbox_2d\": [235, 249, 463, 260], \"label\": \"Text\"}, {\"bbox_2d\": [235, 262, 646, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 287, 514, 300], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 307, 879, 336], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 342, 488, 354], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 361, 519, 374], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 381, 637, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 400, 669, 412], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 419, 504, 432], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 439, 356, 451], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 459, 680, 472], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 479, 782, 492], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 499, 714, 512], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 519, 867, 532], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 539, 878, 568], \"label\": \"List-item\"}, {\"bbox_2d\": [204, 575, 880, 604], \"label\": \"Text\"}, {\"bbox_2d\": [235, 609, 406, 621], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 628, 444, 641], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 647, 375, 659], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 666, 416, 679], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 686, 717, 699], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 706, 575, 719], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 726, 552, 739], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 776, 721, 786], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 788, 274, 798], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 800, 634, 810], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 812, 857, 822], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 824, 880, 854], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 856, 734, 867], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 869, 627, 880], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 882, 756, 892], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 894, 791, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 906, 651, 917], \"label\": \"List-item\"}, {\"bbox_2d\": [513, 978, 522, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [205, 170, 405, 182], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 777, 723, 786], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 788, 244, 797], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 799, 668, 808], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 810, 663, 819], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 821, 883, 841], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 843, 883, 863], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 866, 736, 875], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 877, 630, 886], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 888, 759, 897], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 899, 794, 908], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 910, 522, 919], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 921, 655, 930], \"label\": \"Footnote\"}, {\"bbox_2d\": [513, 979, 522, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [205, 576, 884, 602], \"label\": \"Text\"}, {\"bbox_2d\": [235, 215, 884, 241], \"label\": \"Text\"}, {\"bbox_2d\": [235, 249, 468, 261], \"label\": \"Text\"}, {\"bbox_2d\": [205, 195, 498, 206], \"label\": \"Text\"}, {\"bbox_2d\": [272, 91, 884, 147], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 269, 651, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 289, 518, 300], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 308, 884, 335], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 343, 492, 355], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 363, 524, 375], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 383, 641, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 402, 674, 414], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 422, 509, 434], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 442, 361, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 462, 685, 473], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 481, 787, 493], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 501, 719, 513], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 521, 872, 533], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 541, 884, 568], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 610, 411, 622], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 630, 448, 642], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 650, 379, 661], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 669, 421, 681], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 689, 722, 701], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 709, 579, 721], \"label\": \"List-item\"}, {\"bbox_2d\": [235, 729, 556, 741], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [238, 129, 906, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [1017, 139, 1027, 154], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 232, 470, 381], \"label\": \"Text\"}, {\"bbox_2d\": [512, 232, 902, 292], \"label\": \"Text\"}, {\"bbox_2d\": [87, 411, 481, 538], \"label\": \"Text\"}, {\"bbox_2d\": [87, 569, 481, 761], \"label\": \"Text\"}, {\"bbox_2d\": [87, 792, 480, 918], \"label\": \"Text\"}, {\"bbox_2d\": [512, 725, 902, 851], \"label\": \"Text\"}, {\"bbox_2d\": [512, 612, 892, 695], \"label\": \"Text\"}, {\"bbox_2d\": [512, 322, 902, 582], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [237, 126, 907, 158], \"label\": \"Page-header\"}, {\"bbox_2d\": [1016, 137, 1028, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 229, 469, 381], \"label\": \"Text\"}, {\"bbox_2d\": [87, 408, 480, 538], \"label\": \"Text\"}, {\"bbox_2d\": [87, 565, 480, 762], \"label\": \"Text\"}, {\"bbox_2d\": [87, 789, 478, 919], \"label\": \"Text\"}, {\"bbox_2d\": [512, 229, 902, 292], \"label\": \"Text\"}, {\"bbox_2d\": [512, 319, 902, 583], \"label\": \"Text\"}, {\"bbox_2d\": [512, 610, 892, 695], \"label\": \"Text\"}, {\"bbox_2d\": [512, 722, 903, 852], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [98, 979, 370, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 84, 915, 602], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [99, 84, 929, 603], \"label\": \"Table\"}, {\"bbox_2d\": [101, 978, 370, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [190, 67, 843, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [151, 138, 840, 239], \"label\": \"Text\"}, {\"bbox_2d\": [151, 250, 840, 419], \"label\": \"Text\"}, {\"bbox_2d\": [151, 430, 720, 445], \"label\": \"Text\"}, {\"bbox_2d\": [151, 483, 840, 498], \"label\": \"Text\"}, {\"bbox_2d\": [151, 802, 321, 888], \"label\": \"Text\"}, {\"bbox_2d\": [151, 914, 840, 947], \"label\": \"Text\"}, {\"bbox_2d\": [907, 945, 925, 960], \"label\": \"Page-footer\"}, {\"bbox_2d\": [893, 962, 997, 1018], \"label\": \"Picture\"}, {\"bbox_2d\": [151, 457, 260, 472], \"label\": \"Section-header\"}, {\"bbox_2d\": [151, 508, 429, 523], \"label\": \"List-item\"}, {\"bbox_2d\": [190, 534, 840, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 594, 794, 609], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 620, 573, 635], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 724, 424, 739], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [190, 69, 844, 100], \"label\": \"Section-header\"}, {\"bbox_2d\": [151, 140, 838, 238], \"label\": \"Text\"}, {\"bbox_2d\": [151, 252, 838, 419], \"label\": \"Text\"}, {\"bbox_2d\": [151, 433, 720, 445], \"label\": \"Text\"}, {\"bbox_2d\": [151, 485, 838, 497], \"label\": \"Text\"}, {\"bbox_2d\": [151, 459, 261, 471], \"label\": \"Section-header\"}, {\"bbox_2d\": [151, 727, 425, 739], \"label\": \"Section-header\"}, {\"bbox_2d\": [151, 805, 321, 886], \"label\": \"Text\"}, {\"bbox_2d\": [151, 917, 838, 946], \"label\": \"Text\"}, {\"bbox_2d\": [906, 948, 927, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [880, 960, 1002, 1028], \"label\": \"Picture\"}, {\"bbox_2d\": [151, 511, 428, 523], \"label\": \"List-item\"}, {\"bbox_2d\": [190, 537, 838, 583], \"label\": \"List-item\"}, {\"bbox_2d\": [190, 597, 794, 609], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 623, 574, 635], \"label\": \"List-item\"}, {\"bbox_2d\": [869, 142, 932, 152], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 494], \"label\": \"Text\"}, {\"bbox_2d\": [121, 505, 911, 822], \"label\": \"Text\"}, {\"bbox_2d\": [121, 834, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 494], \"label\": \"Text\"}, {\"bbox_2d\": [121, 505, 911, 822], \"label\": \"Text\"}, {\"bbox_2d\": [121, 834, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [76, 69, 286, 81], \"label\": \"Section-header\"}, {\"bbox_2d\": [108, 99, 921, 204], \"label\": \"Text\"}, {\"bbox_2d\": [108, 218, 921, 354], \"label\": \"Text\"}, {\"bbox_2d\": [108, 367, 796, 378], \"label\": \"Text\"}, {\"bbox_2d\": [76, 393, 923, 666], \"label\": \"Table\"}, {\"bbox_2d\": [317, 685, 922, 942], \"label\": \"Table\"}, {\"bbox_2d\": [317, 685, 485, 695], \"label\": \"Text\"}, {\"bbox_2d\": [76, 901, 922, 941], \"label\": \"Text\"}, {\"bbox_2d\": [966, 984, 979, 996], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [76, 70, 289, 81], \"label\": \"Section-header\"}, {\"bbox_2d\": [107, 98, 920, 206], \"label\": \"Text\"}, {\"bbox_2d\": [107, 219, 920, 354], \"label\": \"Text\"}, {\"bbox_2d\": [107, 368, 884, 379], \"label\": \"Text\"}, {\"bbox_2d\": [318, 395, 485, 404], \"label\": \"Text\"}, {\"bbox_2d\": [318, 410, 408, 420], \"label\": \"Text\"}, {\"bbox_2d\": [318, 687, 485, 697], \"label\": \"Text\"}, {\"bbox_2d\": [318, 703, 408, 713], \"label\": \"Text\"}, {\"bbox_2d\": [75, 431, 919, 664], \"label\": \"Table\"}, {\"bbox_2d\": [966, 982, 981, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [75, 723, 920, 944], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 98, 345, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 126, 332, 136], \"label\": \"Section-header\"}, {\"bbox_2d\": [89, 154, 945, 963], \"label\": \"Table\"}, {\"bbox_2d\": [590, 999, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [999, 96, 1012, 309], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [1002, 94, 1013, 309], \"label\": \"Page-header\"}, {\"bbox_2d\": [591, 1000, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 98, 344, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 126, 332, 137], \"label\": \"Section-header\"}, {\"bbox_2d\": [89, 152, 947, 961], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [114, 18, 210, 65], \"label\": \"Picture\"}, {\"bbox_2d\": [98, 79, 936, 147], \"label\": \"Text\"}, {\"bbox_2d\": [98, 206, 950, 291], \"label\": \"Text\"}, {\"bbox_2d\": [98, 312, 950, 417], \"label\": \"Text\"}, {\"bbox_2d\": [98, 476, 231, 489], \"label\": \"Text\"}, {\"bbox_2d\": [98, 500, 402, 538], \"label\": \"Text\"}, {\"bbox_2d\": [98, 595, 388, 608], \"label\": \"Text\"}, {\"bbox_2d\": [98, 620, 580, 652], \"label\": \"Text\"}, {\"bbox_2d\": [98, 905, 950, 958], \"label\": \"Footnote\"}, {\"bbox_2d\": [888, 973, 950, 981], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 169, 274, 184], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [98, 79, 940, 130], \"label\": \"Text\"}, {\"bbox_2d\": [98, 133, 793, 148], \"label\": \"Text\"}, {\"bbox_2d\": [98, 205, 954, 292], \"label\": \"Text\"}, {\"bbox_2d\": [98, 313, 954, 418], \"label\": \"Text\"}, {\"bbox_2d\": [98, 474, 236, 489], \"label\": \"Text\"}, {\"bbox_2d\": [98, 501, 406, 516], \"label\": \"Text\"}, {\"bbox_2d\": [103, 523, 192, 538], \"label\": \"Text\"}, {\"bbox_2d\": [98, 593, 394, 608], \"label\": \"Text\"}, {\"bbox_2d\": [98, 619, 580, 651], \"label\": \"Text\"}, {\"bbox_2d\": [98, 169, 279, 184], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 905, 953, 957], \"label\": \"Footnote\"}, {\"bbox_2d\": [887, 974, 949, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [112, 19, 209, 66], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 193], \"label\": \"Text\"}, {\"bbox_2d\": [121, 205, 911, 520], \"label\": \"Text\"}, {\"bbox_2d\": [121, 610, 911, 681], \"label\": \"Text\"}, {\"bbox_2d\": [121, 566, 267, 578], \"label\": \"Section-header\"}, {\"bbox_2d\": [141, 794, 764, 806], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 823, 910, 863], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 879, 911, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 936, 825, 948], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 911, 192], \"label\": \"Text\"}, {\"bbox_2d\": [121, 206, 911, 521], \"label\": \"Text\"}, {\"bbox_2d\": [121, 611, 911, 679], \"label\": \"Text\"}, {\"bbox_2d\": [121, 565, 266, 578], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [140, 794, 763, 807], \"label\": \"List-item\"}, {\"bbox_2d\": [140, 823, 911, 864], \"label\": \"List-item\"}, {\"bbox_2d\": [140, 880, 911, 921], \"label\": \"List-item\"}, {\"bbox_2d\": [140, 936, 826, 949], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [89, 46, 686, 64], \"label\": \"Page-header\"}, {\"bbox_2d\": [718, 49, 812, 61], \"label\": \"Text\"}, {\"bbox_2d\": [88, 165, 951, 263], \"label\": \"Text\"}, {\"bbox_2d\": [88, 274, 950, 359], \"label\": \"Text\"}, {\"bbox_2d\": [91, 369, 617, 381], \"label\": \"Text\"}, {\"bbox_2d\": [90, 398, 946, 810], \"label\": \"Table\"}, {\"bbox_2d\": [511, 988, 529, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [59, 140, 257, 153], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 47, 809, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [65, 141, 257, 152], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 164, 944, 262], \"label\": \"Text\"}, {\"bbox_2d\": [91, 274, 944, 358], \"label\": \"Text\"}, {\"bbox_2d\": [91, 370, 618, 381], \"label\": \"Text\"}, {\"bbox_2d\": [88, 402, 944, 810], \"label\": \"Table\"}, {\"bbox_2d\": [510, 988, 527, 1005], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [138, 35, 320, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [654, 35, 897, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 67, 900, 183], \"label\": \"Text\"}, {\"bbox_2d\": [138, 203, 898, 250], \"label\": \"Text\"}, {\"bbox_2d\": [138, 271, 900, 319], \"label\": \"Text\"}, {\"bbox_2d\": [138, 374, 900, 456], \"label\": \"Text\"}, {\"bbox_2d\": [138, 476, 827, 507], \"label\": \"Text\"}, {\"bbox_2d\": [138, 527, 900, 642], \"label\": \"Text\"}, {\"bbox_2d\": [138, 663, 900, 728], \"label\": \"Text\"}, {\"bbox_2d\": [138, 749, 898, 796], \"label\": \"Text\"}, {\"bbox_2d\": [138, 817, 900, 916], \"label\": \"Text\"}, {\"bbox_2d\": [138, 936, 490, 950], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 340, 559, 354], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 974, 549, 986], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [654, 35, 903, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 35, 322, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 65, 903, 183], \"label\": \"Text\"}, {\"bbox_2d\": [138, 201, 902, 251], \"label\": \"Text\"}, {\"bbox_2d\": [138, 269, 902, 319], \"label\": \"Text\"}, {\"bbox_2d\": [138, 371, 902, 456], \"label\": \"Text\"}, {\"bbox_2d\": [138, 337, 565, 354], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 474, 833, 507], \"label\": \"Text\"}, {\"bbox_2d\": [138, 525, 903, 643], \"label\": \"Text\"}, {\"bbox_2d\": [138, 661, 903, 728], \"label\": \"Text\"}, {\"bbox_2d\": [138, 746, 902, 796], \"label\": \"Text\"}, {\"bbox_2d\": [138, 814, 903, 915], \"label\": \"Text\"}, {\"bbox_2d\": [138, 934, 496, 950], \"label\": \"Section-header\"}, {\"bbox_2d\": [534, 972, 554, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 93, 283, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 682, 345, 697], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 766, 320, 781], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 127, 871, 155], \"label\": \"Text\"}, {\"bbox_2d\": [231, 171, 918, 247], \"label\": \"Text\"}, {\"bbox_2d\": [231, 263, 878, 291], \"label\": \"Text\"}, {\"bbox_2d\": [231, 307, 913, 335], \"label\": \"Text\"}, {\"bbox_2d\": [231, 352, 926, 411], \"label\": \"Text\"}, {\"bbox_2d\": [231, 427, 926, 502], \"label\": \"Text\"}, {\"bbox_2d\": [231, 519, 924, 579], \"label\": \"Text\"}, {\"bbox_2d\": [231, 596, 926, 655], \"label\": \"Text\"}, {\"bbox_2d\": [231, 716, 921, 744], \"label\": \"Text\"}, {\"bbox_2d\": [231, 785, 917, 844], \"label\": \"Text\"}, {\"bbox_2d\": [231, 860, 919, 905], \"label\": \"Text\"}, {\"bbox_2d\": [231, 921, 904, 950], \"label\": \"Text\"}, {\"bbox_2d\": [109, 989, 699, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 93, 283, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 127, 875, 155], \"label\": \"Text\"}, {\"bbox_2d\": [231, 168, 918, 246], \"label\": \"Text\"}, {\"bbox_2d\": [231, 260, 883, 291], \"label\": \"Text\"}, {\"bbox_2d\": [231, 308, 913, 336], \"label\": \"Text\"}, {\"bbox_2d\": [231, 352, 926, 411], \"label\": \"Text\"}, {\"bbox_2d\": [231, 428, 926, 503], \"label\": \"Text\"}, {\"bbox_2d\": [231, 519, 923, 579], \"label\": \"Text\"}, {\"bbox_2d\": [231, 595, 926, 654], \"label\": \"Text\"}, {\"bbox_2d\": [231, 716, 950, 743], \"label\": \"Text\"}, {\"bbox_2d\": [231, 785, 917, 844], \"label\": \"Text\"}, {\"bbox_2d\": [231, 861, 919, 904], \"label\": \"Text\"}, {\"bbox_2d\": [231, 921, 904, 949], \"label\": \"Text\"}, {\"bbox_2d\": [109, 682, 345, 697], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 765, 319, 780], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 988, 694, 1000], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [23, 44, 1011, 256], \"label\": \"Table\"}, {\"bbox_2d\": [514, 292, 521, 301], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [514, 293, 519, 300], \"label\": \"Page-footer\"}, {\"bbox_2d\": [19, 41, 1012, 253], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [464, 53, 596, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 84, 535, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 114, 937, 156], \"label\": \"Text\"}, {\"bbox_2d\": [123, 187, 500, 199], \"label\": \"Text\"}, {\"bbox_2d\": [123, 214, 322, 229], \"label\": \"List-item\"}, {\"bbox_2d\": [521, 987, 538, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [464, 53, 596, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 83, 535, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 185, 500, 200], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 113, 939, 156], \"label\": \"Text\"}, {\"bbox_2d\": [521, 987, 538, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [123, 214, 322, 229], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [382, 126, 653, 139], \"label\": \"Text\"}, {\"bbox_2d\": [182, 160, 845, 169], \"label\": \"Text\"}, {\"bbox_2d\": [125, 252, 911, 488], \"label\": \"Text\"}, {\"bbox_2d\": [125, 507, 911, 544], \"label\": \"Text\"}, {\"bbox_2d\": [125, 621, 911, 929], \"label\": \"Text\"}, {\"bbox_2d\": [125, 941, 795, 950], \"label\": \"Footnote\"}, {\"bbox_2d\": [513, 975, 522, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [125, 583, 317, 598], \"label\": \"Section-header\"}, {\"bbox_2d\": [481, 219, 554, 230], \"label\": \"Section-header\"}, {\"bbox_2d\": [252, 88, 782, 104], \"label\": \"Title\"}, {\"bbox_2d\": [29, 287, 62, 734], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [252, 88, 783, 104], \"label\": \"Title\"}, {\"bbox_2d\": [382, 126, 653, 140], \"label\": \"Text\"}, {\"bbox_2d\": [184, 161, 845, 170], \"label\": \"Text\"}, {\"bbox_2d\": [125, 254, 911, 487], \"label\": \"Text\"}, {\"bbox_2d\": [125, 506, 911, 543], \"label\": \"Text\"}, {\"bbox_2d\": [125, 622, 911, 929], \"label\": \"Text\"}, {\"bbox_2d\": [125, 583, 317, 598], \"label\": \"Section-header\"}, {\"bbox_2d\": [480, 219, 556, 230], \"label\": \"Section-header\"}, {\"bbox_2d\": [144, 940, 796, 949], \"label\": \"Footnote\"}, {\"bbox_2d\": [513, 974, 522, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [31, 289, 63, 732], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 922, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 78, 988, 965], \"label\": \"Table\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 77, 989, 965], \"label\": \"Table\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [543, 206, 952, 239], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [544, 206, 952, 240], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [78, 52, 282, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 95, 490, 185], \"label\": \"Text\"}, {\"bbox_2d\": [78, 191, 492, 673], \"label\": \"Text\"}, {\"bbox_2d\": [78, 675, 492, 927], \"label\": \"Text\"}, {\"bbox_2d\": [78, 932, 492, 968], \"label\": \"Text\"}, {\"bbox_2d\": [529, 93, 941, 239], \"label\": \"Text\"}, {\"bbox_2d\": [529, 245, 942, 559], \"label\": \"Text\"}, {\"bbox_2d\": [529, 563, 942, 729], \"label\": \"Text\"}, {\"bbox_2d\": [529, 733, 944, 968], \"label\": \"Text\"}, {\"bbox_2d\": [500, 979, 526, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [79, 51, 295, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [529, 92, 953, 242], \"label\": \"Text\"}, {\"bbox_2d\": [77, 92, 499, 186], \"label\": \"Text\"}, {\"bbox_2d\": [77, 186, 507, 673], \"label\": \"Text\"}, {\"bbox_2d\": [529, 242, 953, 560], \"label\": \"Text\"}, {\"bbox_2d\": [530, 561, 956, 729], \"label\": \"Text\"}, {\"bbox_2d\": [79, 674, 502, 928], \"label\": \"Text\"}, {\"bbox_2d\": [530, 730, 956, 968], \"label\": \"Text\"}, {\"bbox_2d\": [80, 930, 500, 968], \"label\": \"Text\"}, {\"bbox_2d\": [500, 977, 539, 994], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [121, 150, 911, 248], \"label\": \"Text\"}, {\"bbox_2d\": [121, 319, 911, 363], \"label\": \"Text\"}, {\"bbox_2d\": [121, 434, 649, 449], \"label\": \"Text\"}, {\"bbox_2d\": [121, 523, 911, 621], \"label\": \"Text\"}, {\"bbox_2d\": [121, 708, 911, 944], \"label\": \"Text\"}, {\"bbox_2d\": [121, 665, 310, 679], \"label\": \"Section-header\"}, {\"bbox_2d\": [418, 260, 911, 314], \"label\": \"Formula\"}, {\"bbox_2d\": [421, 380, 911, 419], \"label\": \"Formula\"}, {\"bbox_2d\": [381, 465, 911, 505], \"label\": \"Formula\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [121, 146, 911, 193], \"label\": \"Text\"}, {\"bbox_2d\": [121, 204, 911, 248], \"label\": \"Text\"}, {\"bbox_2d\": [416, 262, 911, 332], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 320, 911, 363], \"label\": \"Text\"}, {\"bbox_2d\": [121, 434, 648, 449], \"label\": \"Text\"}, {\"bbox_2d\": [417, 380, 911, 427], \"label\": \"Formula\"}, {\"bbox_2d\": [380, 463, 911, 534], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 523, 911, 621], \"label\": \"Text\"}, {\"bbox_2d\": [121, 709, 911, 944], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 665, 310, 678], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [530, 97, 947, 235], \"label\": \"Text\"}, {\"bbox_2d\": [77, 99, 502, 217], \"label\": \"Text\"}, {\"bbox_2d\": [77, 225, 497, 309], \"label\": \"Text\"}, {\"bbox_2d\": [77, 344, 508, 692], \"label\": \"Text\"}, {\"bbox_2d\": [77, 699, 482, 766], \"label\": \"Text\"}, {\"bbox_2d\": [77, 801, 504, 954], \"label\": \"Text\"}, {\"bbox_2d\": [529, 694, 961, 883], \"label\": \"Text\"}, {\"bbox_2d\": [529, 559, 959, 659], \"label\": \"Text\"}, {\"bbox_2d\": [529, 286, 950, 459], \"label\": \"Text\"}, {\"bbox_2d\": [529, 248, 845, 279], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 323, 295, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [529, 672, 646, 687], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 779, 285, 794], \"label\": \"Section-header\"}, {\"bbox_2d\": [616, 1001, 959, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [529, 466, 964, 536], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [76, 97, 505, 308], \"label\": \"Text\"}, {\"bbox_2d\": [76, 342, 509, 765], \"label\": \"Text\"}, {\"bbox_2d\": [76, 800, 506, 955], \"label\": \"Text\"}, {\"bbox_2d\": [530, 693, 962, 883], \"label\": \"Text\"}, {\"bbox_2d\": [530, 557, 958, 659], \"label\": \"Text\"}, {\"bbox_2d\": [530, 286, 947, 458], \"label\": \"Text\"}, {\"bbox_2d\": [530, 97, 948, 234], \"label\": \"Text\"}, {\"bbox_2d\": [76, 320, 300, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [530, 246, 842, 280], \"label\": \"Section-header\"}, {\"bbox_2d\": [530, 671, 651, 687], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 777, 289, 794], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 463, 967, 538], \"label\": \"Table\"}, {\"bbox_2d\": [615, 999, 959, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [103, 67, 843, 101], \"label\": \"Section-header\"}, {\"bbox_2d\": [103, 198, 271, 211], \"label\": \"Section-header\"}, {\"bbox_2d\": [103, 510, 255, 523], \"label\": \"Section-header\"}, {\"bbox_2d\": [103, 697, 293, 710], \"label\": \"Section-header\"}, {\"bbox_2d\": [142, 721, 407, 751], \"label\": \"Text\"}, {\"bbox_2d\": [142, 356, 414, 388], \"label\": \"Text\"}, {\"bbox_2d\": [142, 400, 483, 499], \"label\": \"Text\"}, {\"bbox_2d\": [142, 531, 844, 751], \"label\": \"Table\"}, {\"bbox_2d\": [142, 138, 844, 347], \"label\": \"Table\"}, {\"bbox_2d\": [903, 944, 925, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [891, 961, 1000, 1020], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [103, 69, 842, 100], \"label\": \"Section-header\"}, {\"bbox_2d\": [100, 139, 844, 753], \"label\": \"Table\"}, {\"bbox_2d\": [906, 948, 927, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [888, 960, 999, 1021], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 108, 887, 224], \"label\": \"Text\"}, {\"bbox_2d\": [147, 260, 886, 402], \"label\": \"Text\"}, {\"bbox_2d\": [147, 516, 886, 607], \"label\": \"Text\"}, {\"bbox_2d\": [147, 643, 887, 733], \"label\": \"Text\"}, {\"bbox_2d\": [147, 847, 887, 912], \"label\": \"Text\"}, {\"bbox_2d\": [147, 770, 889, 811], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [148, 110, 889, 223], \"label\": \"Text\"}, {\"bbox_2d\": [148, 439, 887, 478], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 516, 886, 605], \"label\": \"Text\"}, {\"bbox_2d\": [148, 643, 888, 732], \"label\": \"Text\"}, {\"bbox_2d\": [147, 261, 887, 401], \"label\": \"Text\"}, {\"bbox_2d\": [149, 770, 886, 809], \"label\": \"Text\"}, {\"bbox_2d\": [148, 847, 886, 911], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [241, 103, 892, 121], \"label\": \"Text\"}, {\"bbox_2d\": [241, 125, 788, 143], \"label\": \"List-item\"}, {\"bbox_2d\": [241, 147, 916, 181], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 202, 926, 234], \"label\": \"Text\"}, {\"bbox_2d\": [231, 238, 346, 256], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 256, 385, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 273, 348, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 299, 926, 348], \"label\": \"Text\"}, {\"bbox_2d\": [231, 555, 926, 602], \"label\": \"Text\"}, {\"bbox_2d\": [231, 520, 551, 535], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 790, 926, 823], \"label\": \"Text\"}, {\"bbox_2d\": [231, 836, 613, 850], \"label\": \"Text\"}, {\"bbox_2d\": [231, 858, 338, 875], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 610, 431, 624], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 624, 411, 639], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 639, 911, 656], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 656, 551, 673], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 673, 321, 687], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 687, 599, 703], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 703, 338, 719], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 719, 338, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 734, 338, 759], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 881, 926, 943], \"label\": \"Text\"}, {\"bbox_2d\": [109, 757, 337, 777], \"label\": \"Section-header\"}, {\"bbox_2d\": [636, 985, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 365, 928, 521], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [241, 107, 889, 119], \"label\": \"Text\"}, {\"bbox_2d\": [231, 206, 926, 233], \"label\": \"Text\"}, {\"bbox_2d\": [231, 304, 927, 347], \"label\": \"Text\"}, {\"bbox_2d\": [231, 557, 926, 601], \"label\": \"Text\"}, {\"bbox_2d\": [231, 794, 926, 822], \"label\": \"Text\"}, {\"bbox_2d\": [231, 839, 614, 851], \"label\": \"Text\"}, {\"bbox_2d\": [255, 883, 926, 942], \"label\": \"Text\"}, {\"bbox_2d\": [635, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 861, 334, 873], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 721, 334, 733], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 705, 335, 717], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 690, 594, 702], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 674, 317, 686], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 658, 544, 670], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 643, 907, 655], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 627, 407, 639], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 611, 424, 623], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 275, 343, 287], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 259, 380, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 243, 336, 256], \"label\": \"List-item\"}, {\"bbox_2d\": [241, 152, 932, 180], \"label\": \"List-item\"}, {\"bbox_2d\": [241, 130, 786, 142], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 365, 929, 518], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 523, 546, 534], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 760, 332, 776], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [699, 93, 721, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 326, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [157, 123, 397, 708], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 111, 895, 148], \"label\": \"Text\"}, {\"bbox_2d\": [521, 148, 895, 195], \"label\": \"Text\"}, {\"bbox_2d\": [521, 199, 895, 288], \"label\": \"Text\"}, {\"bbox_2d\": [521, 293, 895, 358], \"label\": \"Text\"}, {\"bbox_2d\": [521, 362, 895, 402], \"label\": \"Text\"}, {\"bbox_2d\": [130, 730, 500, 769], \"label\": \"Text\"}, {\"bbox_2d\": [130, 775, 500, 813], \"label\": \"Text\"}, {\"bbox_2d\": [130, 817, 500, 896], \"label\": \"Text\"}, {\"bbox_2d\": [130, 903, 500, 982], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [699, 92, 721, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [440, 74, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 327, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 125, 351, 135], \"label\": \"Text\"}, {\"bbox_2d\": [526, 110, 895, 147], \"label\": \"Text\"}, {\"bbox_2d\": [526, 150, 896, 199], \"label\": \"Text\"}, {\"bbox_2d\": [524, 203, 895, 291], \"label\": \"Text\"}, {\"bbox_2d\": [526, 296, 896, 358], \"label\": \"Text\"}, {\"bbox_2d\": [526, 362, 894, 398], \"label\": \"Text\"}, {\"bbox_2d\": [131, 731, 499, 769], \"label\": \"Text\"}, {\"bbox_2d\": [131, 775, 501, 812], \"label\": \"Text\"}, {\"bbox_2d\": [130, 818, 501, 897], \"label\": \"Text\"}, {\"bbox_2d\": [130, 903, 501, 982], \"label\": \"Text\"}, {\"bbox_2d\": [159, 135, 499, 240], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 241, 499, 375], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 375, 499, 487], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 489, 499, 601], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 602, 499, 704], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 894, 893, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 820, 893, 894], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 740, 893, 818], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 656, 894, 739], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 570, 892, 654], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 479, 893, 568], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 425, 893, 479], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [456, 53, 604, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [320, 73, 345, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [716, 73, 740, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [179, 92, 517, 119], \"label\": \"Text\"}, {\"bbox_2d\": [547, 92, 912, 132], \"label\": \"Text\"}, {\"bbox_2d\": [547, 132, 707, 143], \"label\": \"Text\"}, {\"bbox_2d\": [547, 143, 912, 195], \"label\": \"Text\"}, {\"bbox_2d\": [558, 195, 829, 222], \"label\": \"Text\"}, {\"bbox_2d\": [193, 206, 517, 327], \"label\": \"Text\"}, {\"bbox_2d\": [547, 222, 912, 306], \"label\": \"Text\"}, {\"bbox_2d\": [677, 315, 780, 326], \"label\": \"Text\"}, {\"bbox_2d\": [179, 133, 372, 190], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [320, 75, 343, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [456, 56, 604, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [715, 76, 740, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [677, 319, 781, 333], \"label\": \"Text\"}, {\"bbox_2d\": [542, 225, 912, 300], \"label\": \"Text\"}, {\"bbox_2d\": [543, 198, 912, 221], \"label\": \"Text\"}, {\"bbox_2d\": [543, 172, 912, 195], \"label\": \"Text\"}, {\"bbox_2d\": [543, 146, 912, 169], \"label\": \"Text\"}, {\"bbox_2d\": [178, 94, 511, 117], \"label\": \"List-item\"}, {\"bbox_2d\": [193, 209, 517, 327], \"label\": \"List-item\"}, {\"bbox_2d\": [573, 133, 703, 143], \"label\": \"List-item\"}, {\"bbox_2d\": [573, 94, 912, 130], \"label\": \"List-item\"}, {\"bbox_2d\": [174, 133, 374, 188], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 43, 385, 59], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 102, 232, 117], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 73, 123, 86], \"label\": \"Text\"}, {\"bbox_2d\": [85, 133, 962, 160], \"label\": \"Text\"}, {\"bbox_2d\": [85, 314, 557, 326], \"label\": \"Text\"}, {\"bbox_2d\": [85, 527, 653, 538], \"label\": \"Text\"}, {\"bbox_2d\": [85, 553, 963, 692], \"label\": \"Text\"}, {\"bbox_2d\": [85, 707, 962, 831], \"label\": \"Text\"}, {\"bbox_2d\": [85, 846, 963, 953], \"label\": \"Text\"}, {\"bbox_2d\": [85, 999, 98, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 175, 964, 298], \"label\": \"Table\"}, {\"bbox_2d\": [85, 340, 963, 525], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [85, 38, 387, 61], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 97, 232, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 71, 123, 87], \"label\": \"Text\"}, {\"bbox_2d\": [85, 130, 966, 161], \"label\": \"Text\"}, {\"bbox_2d\": [83, 173, 969, 300], \"label\": \"Table\"}, {\"bbox_2d\": [85, 335, 967, 525], \"label\": \"Table\"}, {\"bbox_2d\": [85, 311, 560, 327], \"label\": \"Text\"}, {\"bbox_2d\": [85, 525, 654, 537], \"label\": \"Text\"}, {\"bbox_2d\": [85, 550, 968, 693], \"label\": \"Text\"}, {\"bbox_2d\": [85, 705, 961, 832], \"label\": \"Text\"}, {\"bbox_2d\": [85, 844, 968, 955], \"label\": \"Text\"}, {\"bbox_2d\": [85, 997, 100, 1011], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 155, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [194, 38, 841, 47], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 85, 182, 96], \"label\": \"Section-header\"}, {\"bbox_2d\": [42, 101, 717, 111], \"label\": \"Text\"}, {\"bbox_2d\": [25, 456, 1009, 478], \"label\": \"Text\"}, {\"bbox_2d\": [25, 487, 1009, 541], \"label\": \"Text\"}, {\"bbox_2d\": [49, 552, 911, 562], \"label\": \"Text\"}, {\"bbox_2d\": [25, 572, 1009, 605], \"label\": \"Text\"}, {\"bbox_2d\": [49, 615, 957, 625], \"label\": \"Text\"}, {\"bbox_2d\": [504, 648, 529, 659], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 122, 1009, 442], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 155, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [193, 37, 836, 48], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 84, 182, 95], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 100, 717, 111], \"label\": \"Text\"}, {\"bbox_2d\": [49, 552, 911, 562], \"label\": \"Text\"}, {\"bbox_2d\": [49, 614, 984, 625], \"label\": \"Text\"}, {\"bbox_2d\": [25, 455, 1009, 477], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 486, 1009, 543], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 571, 1009, 605], \"label\": \"List-item\"}, {\"bbox_2d\": [504, 648, 529, 659], \"label\": \"Page-footer\"}, {\"bbox_2d\": [21, 121, 1009, 441], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [228, 56, 959, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [605, 141, 959, 164], \"label\": \"Section-header\"}, {\"bbox_2d\": [352, 288, 684, 303], \"label\": \"Section-header\"}, {\"bbox_2d\": [51, 321, 276, 409], \"label\": \"Picture\"}, {\"bbox_2d\": [147, 438, 451, 465], \"label\": \"Text\"}, {\"bbox_2d\": [147, 484, 962, 556], \"label\": \"Text\"}, {\"bbox_2d\": [147, 574, 962, 661], \"label\": \"Text\"}, {\"bbox_2d\": [147, 679, 962, 736], \"label\": \"Text\"}, {\"bbox_2d\": [147, 754, 962, 797], \"label\": \"Text\"}, {\"bbox_2d\": [152, 839, 303, 857], \"label\": \"Picture\"}, {\"bbox_2d\": [147, 907, 283, 934], \"label\": \"Text\"}, {\"bbox_2d\": [780, 989, 975, 1001], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [352, 287, 683, 303], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 439, 451, 466], \"label\": \"Text\"}, {\"bbox_2d\": [147, 484, 973, 556], \"label\": \"Text\"}, {\"bbox_2d\": [147, 574, 959, 661], \"label\": \"Text\"}, {\"bbox_2d\": [147, 679, 966, 736], \"label\": \"Text\"}, {\"bbox_2d\": [147, 755, 959, 797], \"label\": \"Text\"}, {\"bbox_2d\": [781, 990, 974, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [147, 907, 283, 934], \"label\": \"Text\"}, {\"bbox_2d\": [151, 832, 302, 858], \"label\": \"Picture\"}, {\"bbox_2d\": [228, 56, 957, 109], \"label\": \"Page-header\"}, {\"bbox_2d\": [54, 336, 267, 401], \"label\": \"Picture\"}, {\"bbox_2d\": [605, 142, 957, 166], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [631, 123, 961, 134], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 165, 887, 178], \"label\": \"Text\"}, {\"bbox_2d\": [125, 272, 962, 332], \"label\": \"Text\"}, {\"bbox_2d\": [125, 341, 962, 385], \"label\": \"Text\"}, {\"bbox_2d\": [125, 424, 962, 452], \"label\": \"Text\"}, {\"bbox_2d\": [125, 461, 962, 490], \"label\": \"Text\"}, {\"bbox_2d\": [125, 500, 962, 528], \"label\": \"Text\"}, {\"bbox_2d\": [125, 538, 962, 566], \"label\": \"Text\"}, {\"bbox_2d\": [125, 576, 962, 622], \"label\": \"Text\"}, {\"bbox_2d\": [125, 661, 962, 727], \"label\": \"Text\"}, {\"bbox_2d\": [125, 741, 344, 756], \"label\": \"Text\"}, {\"bbox_2d\": [125, 767, 962, 828], \"label\": \"Text\"}, {\"bbox_2d\": [125, 866, 962, 911], \"label\": \"Text\"}, {\"bbox_2d\": [125, 921, 962, 948], \"label\": \"Text\"}, {\"bbox_2d\": [867, 987, 960, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [459, 987, 630, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [125, 845, 432, 859], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 641, 688, 655], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 405, 538, 419], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 213, 406, 228], \"label\": \"Section-header\"}, {\"bbox_2d\": [172, 253, 488, 266], \"label\": \"Section-header\"}, {\"bbox_2d\": [145, 61, 373, 115], \"label\": \"Picture\"}, {\"bbox_2d\": [144, 974, 189, 1001], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [136, 53, 374, 123], \"label\": \"Picture\"}, {\"bbox_2d\": [628, 122, 957, 134], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 162, 890, 178], \"label\": \"Text\"}, {\"bbox_2d\": [125, 269, 967, 331], \"label\": \"Text\"}, {\"bbox_2d\": [125, 339, 967, 385], \"label\": \"Text\"}, {\"bbox_2d\": [125, 422, 967, 452], \"label\": \"Text\"}, {\"bbox_2d\": [125, 460, 967, 491], \"label\": \"Text\"}, {\"bbox_2d\": [125, 499, 967, 530], \"label\": \"Text\"}, {\"bbox_2d\": [125, 537, 967, 568], \"label\": \"Text\"}, {\"bbox_2d\": [125, 576, 967, 622], \"label\": \"Text\"}, {\"bbox_2d\": [125, 658, 967, 727], \"label\": \"Text\"}, {\"bbox_2d\": [125, 742, 344, 757], \"label\": \"Text\"}, {\"bbox_2d\": [125, 765, 967, 827], \"label\": \"Text\"}, {\"bbox_2d\": [125, 863, 967, 910], \"label\": \"Text\"}, {\"bbox_2d\": [125, 917, 967, 948], \"label\": \"Text\"}, {\"bbox_2d\": [458, 987, 631, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [139, 974, 194, 1003], \"label\": \"Picture\"}, {\"bbox_2d\": [169, 842, 439, 860], \"label\": \"Section-header\"}, {\"bbox_2d\": [169, 637, 694, 655], \"label\": \"Section-header\"}, {\"bbox_2d\": [169, 401, 546, 418], \"label\": \"Section-header\"}, {\"bbox_2d\": [169, 248, 494, 266], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 208, 408, 230], \"label\": \"Section-header\"}, {\"bbox_2d\": [867, 987, 966, 1000], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [156, 91, 354, 103], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 166, 295, 178], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 225, 295, 237], \"label\": \"Section-header\"}, {\"bbox_2d\": [172, 116, 878, 143], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 190, 818, 202], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 248, 879, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 282, 590, 294], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 303, 613, 315], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 324, 648, 335], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 344, 633, 355], \"label\": \"List-item\"}, {\"bbox_2d\": [221, 364, 765, 375], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 383, 860, 394], \"label\": \"Text\"}, {\"bbox_2d\": [186, 403, 590, 415], \"label\": \"List-item\"}, {\"bbox_2d\": [226, 424, 613, 435], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 444, 648, 455], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 464, 633, 474], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 483, 767, 494], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 536, 852, 548], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 557, 627, 568], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 577, 648, 588], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 597, 757, 608], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 617, 393, 628], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 640, 879, 694], \"label\": \"Text\"}, {\"bbox_2d\": [756, 733, 881, 745], \"label\": \"Section-header\"}, {\"bbox_2d\": [435, 730, 601, 745], \"label\": \"Section-header\"}, {\"bbox_2d\": [379, 758, 655, 773], \"label\": \"Section-header\"}, {\"bbox_2d\": [172, 808, 831, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 827, 879, 885], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 909, 879, 932], \"label\": \"Footnote\"}, {\"bbox_2d\": [509, 979, 526, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [156, 91, 359, 103], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 165, 299, 177], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 225, 299, 236], \"label\": \"Section-header\"}, {\"bbox_2d\": [434, 730, 610, 746], \"label\": \"Section-header\"}, {\"bbox_2d\": [380, 758, 662, 774], \"label\": \"Section-header\"}, {\"bbox_2d\": [757, 734, 886, 745], \"label\": \"Text\"}, {\"bbox_2d\": [156, 116, 884, 142], \"label\": \"Text\"}, {\"bbox_2d\": [171, 190, 823, 201], \"label\": \"Text\"}, {\"bbox_2d\": [156, 249, 884, 276], \"label\": \"Text\"}, {\"bbox_2d\": [156, 382, 866, 394], \"label\": \"Text\"}, {\"bbox_2d\": [156, 639, 884, 696], \"label\": \"Text\"}, {\"bbox_2d\": [171, 808, 835, 819], \"label\": \"Text\"}, {\"bbox_2d\": [156, 827, 884, 884], \"label\": \"Text\"}, {\"bbox_2d\": [156, 910, 883, 930], \"label\": \"Footnote\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [186, 284, 595, 295], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 303, 617, 315], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 323, 652, 335], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 343, 636, 355], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 363, 770, 374], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 402, 595, 414], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 422, 617, 434], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 442, 652, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 462, 636, 473], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 481, 771, 493], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 501, 884, 528], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 536, 857, 548], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 556, 631, 567], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 575, 652, 587], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 595, 762, 607], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 615, 398, 627], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [295, 95, 726, 363], \"label\": \"Text\"}, {\"bbox_2d\": [275, 411, 898, 441], \"label\": \"Text\"}, {\"bbox_2d\": [384, 480, 911, 534], \"label\": \"Text\"}, {\"bbox_2d\": [384, 541, 906, 595], \"label\": \"List-item\"}, {\"bbox_2d\": [384, 602, 898, 630], \"label\": \"List-item\"}, {\"bbox_2d\": [384, 640, 910, 680], \"label\": \"Text\"}, {\"bbox_2d\": [275, 686, 892, 739], \"label\": \"Text\"}, {\"bbox_2d\": [275, 745, 840, 755], \"label\": \"Text\"}, {\"bbox_2d\": [275, 761, 887, 799], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [274, 454, 466, 468], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [472, 94, 648, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [297, 117, 689, 129], \"label\": \"Text\"}, {\"bbox_2d\": [297, 140, 472, 152], \"label\": \"Text\"}, {\"bbox_2d\": [309, 164, 730, 294], \"label\": \"Table\"}, {\"bbox_2d\": [297, 352, 810, 364], \"label\": \"Text\"}, {\"bbox_2d\": [273, 412, 899, 441], \"label\": \"Text\"}, {\"bbox_2d\": [273, 453, 912, 804], \"label\": \"Table\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [128, 18, 256, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [45, 14, 84, 25], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 100, 440, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 449, 703, 459], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 465, 255, 475], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 670, 186, 680], \"label\": \"Section-header\"}, {\"bbox_2d\": [45, 1001, 170, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [113, 925, 887, 944], \"label\": \"Text\"}, {\"bbox_2d\": [113, 944, 483, 953], \"label\": \"Text\"}, {\"bbox_2d\": [113, 955, 865, 984], \"label\": \"Text\"}, {\"bbox_2d\": [113, 588, 887, 597], \"label\": \"Text\"}, {\"bbox_2d\": [113, 608, 865, 622], \"label\": \"Text\"}, {\"bbox_2d\": [113, 628, 865, 649], \"label\": \"Text\"}, {\"bbox_2d\": [113, 419, 346, 428], \"label\": \"Text\"}, {\"bbox_2d\": [113, 384, 713, 393], \"label\": \"Text\"}, {\"bbox_2d\": [113, 398, 891, 408], \"label\": \"Text\"}, {\"bbox_2d\": [113, 380, 337, 389], \"label\": \"Text\"}, {\"bbox_2d\": [87, 114, 919, 230], \"label\": \"Table\"}, {\"bbox_2d\": [87, 254, 919, 370], \"label\": \"Table\"}, {\"bbox_2d\": [87, 478, 919, 582], \"label\": \"Table\"}, {\"bbox_2d\": [87, 685, 919, 917], \"label\": \"Table\"}, {\"bbox_2d\": [87, 496, 919, 579], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 20, 255, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 101, 440, 112], \"label\": \"List-item\"}, {\"bbox_2d\": [72, 450, 703, 460], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 466, 255, 476], \"label\": \"List-item\"}, {\"bbox_2d\": [89, 378, 340, 387], \"label\": \"List-item\"}, {\"bbox_2d\": [122, 389, 741, 397], \"label\": \"List-item\"}, {\"bbox_2d\": [122, 399, 896, 418], \"label\": \"List-item\"}, {\"bbox_2d\": [122, 420, 457, 428], \"label\": \"List-item\"}, {\"bbox_2d\": [113, 589, 891, 608], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 610, 482, 618], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 621, 876, 650], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 671, 187, 681], \"label\": \"List-item\"}, {\"bbox_2d\": [113, 924, 891, 943], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 945, 482, 954], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 956, 876, 985], \"label\": \"List-item\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [110, 684, 915, 916], \"label\": \"Table\"}, {\"bbox_2d\": [111, 479, 914, 581], \"label\": \"Table\"}, {\"bbox_2d\": [87, 115, 916, 229], \"label\": \"Table\"}, {\"bbox_2d\": [86, 254, 914, 370], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [140, 107, 924, 640], \"label\": \"Picture\"}, {\"bbox_2d\": [141, 710, 927, 895], \"label\": \"Picture\"}, {\"bbox_2d\": [348, 909, 711, 927], \"label\": \"Caption\"}, {\"bbox_2d\": [509, 988, 546, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [725, 59, 963, 118], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [724, 58, 974, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [139, 101, 939, 898], \"label\": \"Picture\"}, {\"bbox_2d\": [348, 907, 726, 923], \"label\": \"Caption\"}, {\"bbox_2d\": [514, 987, 546, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [192, 130, 830, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [132, 170, 895, 224], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 231, 871, 284], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 392, 895, 445], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 452, 809, 485], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 493, 899, 608], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 615, 901, 709], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 735, 645, 748], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 755, 872, 809], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 896, 736, 909], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 372, 299, 384], \"label\": \"Text\"}, {\"bbox_2d\": [192, 312, 874, 345], \"label\": \"Section-header\"}, {\"bbox_2d\": [192, 838, 872, 868], \"label\": \"Section-header\"}, {\"bbox_2d\": [507, 958, 528, 974], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [190, 126, 838, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 308, 872, 344], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 833, 872, 869], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 368, 305, 385], \"label\": \"Text\"}, {\"bbox_2d\": [133, 167, 902, 223], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 227, 876, 284], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 389, 902, 445], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 449, 814, 486], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 490, 903, 607], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 611, 907, 707], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 752, 877, 808], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 732, 652, 748], \"label\": \"Text\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}, {\"bbox_2d\": [133, 893, 743, 909], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 107, 1011, 213], \"label\": \"Table\"}, {\"bbox_2d\": [25, 226, 514, 237], \"label\": \"Text\"}, {\"bbox_2d\": [26, 242, 1012, 382], \"label\": \"Table\"}, {\"bbox_2d\": [25, 396, 350, 407], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 411, 626, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 426, 606, 437], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 441, 233, 450], \"label\": \"List-item\"}, {\"bbox_2d\": [508, 523, 525, 533], \"label\": \"Page-footer\"}, {\"bbox_2d\": [49, 94, 391, 104], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [49, 93, 391, 104], \"label\": \"Text\"}, {\"bbox_2d\": [508, 522, 525, 533], \"label\": \"Page-footer\"}, {\"bbox_2d\": [22, 237, 1014, 383], \"label\": \"Table\"}, {\"bbox_2d\": [61, 227, 514, 238], \"label\": \"Text\"}, {\"bbox_2d\": [23, 107, 1013, 217], \"label\": \"Table\"}, {\"bbox_2d\": [25, 396, 349, 407], \"label\": \"Footnote\"}, {\"bbox_2d\": [25, 411, 626, 421], \"label\": \"Footnote\"}, {\"bbox_2d\": [25, 425, 607, 436], \"label\": \"Footnote\"}, {\"bbox_2d\": [25, 439, 232, 450], \"label\": \"Footnote\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 98, 600, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 132, 669, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 165, 913, 225], \"label\": \"Text\"}, {\"bbox_2d\": [121, 241, 913, 286], \"label\": \"Text\"}, {\"bbox_2d\": [121, 303, 913, 333], \"label\": \"Text\"}, {\"bbox_2d\": [121, 399, 913, 444], \"label\": \"Text\"}, {\"bbox_2d\": [121, 465, 913, 509], \"label\": \"Text\"}, {\"bbox_2d\": [121, 546, 913, 621], \"label\": \"Text\"}, {\"bbox_2d\": [121, 640, 913, 732], \"label\": \"Text\"}, {\"bbox_2d\": [121, 769, 913, 861], \"label\": \"Text\"}, {\"bbox_2d\": [121, 753, 484, 764], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 528, 353, 540], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 382, 383, 394], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 353, 302, 365], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 942, 536, 953], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 605, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 129, 674, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 349, 307, 366], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 525, 356, 541], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 381, 387, 398], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 749, 919, 861], \"label\": \"Text\"}, {\"bbox_2d\": [121, 637, 919, 733], \"label\": \"Text\"}, {\"bbox_2d\": [121, 541, 919, 621], \"label\": \"Text\"}, {\"bbox_2d\": [121, 461, 918, 510], \"label\": \"Text\"}, {\"bbox_2d\": [121, 397, 918, 446], \"label\": \"Text\"}, {\"bbox_2d\": [121, 301, 918, 334], \"label\": \"Text\"}, {\"bbox_2d\": [121, 237, 918, 286], \"label\": \"Text\"}, {\"bbox_2d\": [121, 161, 919, 225], \"label\": \"Text\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 99, 911, 152], \"label\": \"Text\"}, {\"bbox_2d\": [121, 197, 305, 210], \"label\": \"Text\"}, {\"bbox_2d\": [121, 296, 164, 309], \"label\": \"Text\"}, {\"bbox_2d\": [121, 450, 671, 463], \"label\": \"Text\"}, {\"bbox_2d\": [121, 548, 911, 590], \"label\": \"Text\"}, {\"bbox_2d\": [121, 573, 911, 677], \"label\": \"Text\"}, {\"bbox_2d\": [121, 758, 911, 830], \"label\": \"Text\"}, {\"bbox_2d\": [121, 894, 911, 921], \"label\": \"Text\"}, {\"bbox_2d\": [121, 340, 731, 383], \"label\": \"Text\"}, {\"bbox_2d\": [172, 397, 808, 416], \"label\": \"Text\"}, {\"bbox_2d\": [286, 490, 639, 530], \"label\": \"Formula\"}, {\"bbox_2d\": [296, 697, 639, 740], \"label\": \"Formula\"}, {\"bbox_2d\": [288, 827, 639, 878], \"label\": \"Formula\"}, {\"bbox_2d\": [508, 971, 527, 984], \"label\": \"Page-footer\"}, {\"bbox_2d\": [286, 237, 911, 291], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 92, 912, 125], \"label\": \"Text\"}, {\"bbox_2d\": [121, 198, 306, 209], \"label\": \"Text\"}, {\"bbox_2d\": [121, 296, 165, 307], \"label\": \"Text\"}, {\"bbox_2d\": [155, 451, 669, 463], \"label\": \"Text\"}, {\"bbox_2d\": [121, 549, 914, 589], \"label\": \"Text\"}, {\"bbox_2d\": [121, 601, 914, 675], \"label\": \"Text\"}, {\"bbox_2d\": [121, 762, 914, 810], \"label\": \"Text\"}, {\"bbox_2d\": [121, 894, 842, 913], \"label\": \"Text\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [317, 831, 718, 871], \"label\": \"Formula\"}, {\"bbox_2d\": [295, 696, 740, 737], \"label\": \"Formula\"}, {\"bbox_2d\": [286, 485, 749, 526], \"label\": \"Formula\"}, {\"bbox_2d\": [228, 339, 807, 415], \"label\": \"Formula\"}, {\"bbox_2d\": [315, 232, 914, 272], \"label\": \"Formula\"}, {\"bbox_2d\": [401, 149, 634, 162], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [968, 85, 975, 236], \"label\": \"Page-header\"}, {\"bbox_2d\": [34, 501, 59, 544], \"label\": \"Page-header\"}, {\"bbox_2d\": [137, 381, 906, 600], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [76, 86, 951, 964], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [65, 70, 271, 99], \"label\": \"Section-header\"}, {\"bbox_2d\": [65, 324, 380, 339], \"label\": \"Section-header\"}, {\"bbox_2d\": [65, 115, 476, 240], \"label\": \"Text\"}, {\"bbox_2d\": [65, 244, 475, 314], \"label\": \"Text\"}, {\"bbox_2d\": [65, 348, 475, 565], \"label\": \"Text\"}, {\"bbox_2d\": [65, 570, 475, 679], \"label\": \"Text\"}, {\"bbox_2d\": [515, 115, 930, 297], \"label\": \"Text\"}, {\"bbox_2d\": [515, 301, 930, 649], \"label\": \"Text\"}, {\"bbox_2d\": [515, 836, 930, 983], \"label\": \"Text\"}, {\"bbox_2d\": [65, 875, 476, 983], \"label\": \"Text\"}, {\"bbox_2d\": [64, 678, 367, 867], \"label\": \"Text\"}, {\"bbox_2d\": [482, 997, 510, 1006], \"label\": \"Page-footer\"}, {\"bbox_2d\": [514, 661, 930, 815], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [67, 69, 290, 100], \"label\": \"Page-header\"}, {\"bbox_2d\": [514, 110, 937, 296], \"label\": \"Text\"}, {\"bbox_2d\": [62, 111, 486, 241], \"label\": \"Text\"}, {\"bbox_2d\": [62, 242, 492, 316], \"label\": \"Text\"}, {\"bbox_2d\": [508, 297, 942, 649], \"label\": \"Text\"}, {\"bbox_2d\": [62, 343, 483, 566], \"label\": \"Text\"}, {\"bbox_2d\": [62, 322, 432, 339], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 567, 480, 678], \"label\": \"Text\"}, {\"bbox_2d\": [232, 679, 322, 772], \"label\": \"Formula\"}, {\"bbox_2d\": [63, 772, 368, 865], \"label\": \"Text\"}, {\"bbox_2d\": [63, 871, 485, 982], \"label\": \"Text\"}, {\"bbox_2d\": [515, 834, 939, 982], \"label\": \"Text\"}, {\"bbox_2d\": [483, 994, 512, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [510, 657, 932, 816], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [362, 43, 589, 57], \"label\": \"Section-header\"}, {\"bbox_2d\": [362, 61, 392, 71], \"label\": \"Text\"}, {\"bbox_2d\": [362, 75, 936, 648], \"label\": \"Table\"}, {\"bbox_2d\": [323, 699, 935, 998], \"label\": \"Picture\"}, {\"bbox_2d\": [142, 701, 288, 947], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [362, 44, 934, 640], \"label\": \"Table\"}, {\"bbox_2d\": [125, 700, 950, 1000], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 93, 630, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 436, 604, 453], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 128, 918, 154], \"label\": \"Text\"}, {\"bbox_2d\": [231, 172, 922, 198], \"label\": \"Text\"}, {\"bbox_2d\": [231, 471, 920, 529], \"label\": \"Text\"}, {\"bbox_2d\": [241, 553, 911, 614], \"label\": \"Text\"}, {\"bbox_2d\": [231, 639, 847, 652], \"label\": \"Text\"}, {\"bbox_2d\": [231, 661, 652, 674], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 683, 920, 695], \"label\": \"List-item\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 869, 635, 879], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 398, 625, 409], \"label\": \"Caption\"}, {\"bbox_2d\": [230, 218, 928, 393], \"label\": \"Picture\"}, {\"bbox_2d\": [232, 825, 926, 864], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [109, 93, 629, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 436, 604, 451], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 127, 922, 155], \"label\": \"Text\"}, {\"bbox_2d\": [231, 172, 926, 199], \"label\": \"Text\"}, {\"bbox_2d\": [231, 470, 926, 529], \"label\": \"Text\"}, {\"bbox_2d\": [241, 554, 916, 613], \"label\": \"Text\"}, {\"bbox_2d\": [231, 639, 848, 651], \"label\": \"Text\"}, {\"bbox_2d\": [231, 661, 657, 673], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 683, 920, 695], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 218, 928, 394], \"label\": \"Picture\"}, {\"bbox_2d\": [235, 713, 928, 865], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 868, 635, 879], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 398, 625, 409], \"label\": \"Caption\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [71, 107, 175, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [71, 495, 156, 509], \"label\": \"Section-header\"}, {\"bbox_2d\": [71, 692, 328, 706], \"label\": \"Section-header\"}, {\"bbox_2d\": [75, 133, 305, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 203, 351, 245], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 258, 347, 300], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 311, 316, 354], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 365, 252, 378], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 389, 252, 447], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 528, 242, 555], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 566, 253, 593], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 604, 329, 631], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 723, 260, 750], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 761, 279, 805], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 816, 239, 843], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 854, 249, 881], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 723, 615, 750], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 761, 557, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 811, 580, 828], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 839, 560, 866], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 619, 616, 646], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 581, 605, 608], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 528, 545, 571], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 389, 562, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 303, 622, 378], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 203, 552, 262], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 133, 621, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 133, 907, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 203, 924, 245], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 258, 814, 285], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 311, 867, 338], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 349, 816, 376], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 387, 907, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 528, 933, 555], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 566, 843, 593], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 604, 843, 631], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 723, 853, 750], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 761, 840, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 801, 908, 843], \"label\": \"List-item\"}, {\"bbox_2d\": [679, 854, 847, 881], \"label\": \"List-item\"}, {\"bbox_2d\": [75, 999, 419, 1014], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [68, 104, 176, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [75, 134, 302, 193], \"label\": \"Text\"}, {\"bbox_2d\": [377, 134, 621, 193], \"label\": \"Text\"}, {\"bbox_2d\": [678, 134, 907, 193], \"label\": \"Text\"}, {\"bbox_2d\": [678, 203, 923, 247], \"label\": \"Text\"}, {\"bbox_2d\": [377, 203, 603, 294], \"label\": \"Text\"}, {\"bbox_2d\": [76, 203, 352, 247], \"label\": \"Text\"}, {\"bbox_2d\": [76, 257, 345, 300], \"label\": \"Text\"}, {\"bbox_2d\": [76, 310, 339, 354], \"label\": \"Text\"}, {\"bbox_2d\": [377, 304, 625, 379], \"label\": \"Text\"}, {\"bbox_2d\": [377, 389, 561, 432], \"label\": \"Text\"}, {\"bbox_2d\": [76, 364, 253, 392], \"label\": \"Text\"}, {\"bbox_2d\": [76, 402, 324, 446], \"label\": \"Text\"}, {\"bbox_2d\": [76, 527, 241, 555], \"label\": \"Text\"}, {\"bbox_2d\": [76, 565, 253, 593], \"label\": \"Text\"}, {\"bbox_2d\": [76, 603, 330, 631], \"label\": \"Text\"}, {\"bbox_2d\": [377, 527, 547, 571], \"label\": \"Text\"}, {\"bbox_2d\": [377, 581, 604, 609], \"label\": \"Text\"}, {\"bbox_2d\": [377, 619, 617, 647], \"label\": \"Text\"}, {\"bbox_2d\": [678, 527, 933, 555], \"label\": \"Text\"}, {\"bbox_2d\": [678, 565, 840, 593], \"label\": \"Text\"}, {\"bbox_2d\": [678, 603, 843, 631], \"label\": \"Text\"}, {\"bbox_2d\": [76, 723, 260, 752], \"label\": \"Text\"}, {\"bbox_2d\": [377, 723, 614, 752], \"label\": \"Text\"}, {\"bbox_2d\": [377, 761, 558, 790], \"label\": \"Text\"}, {\"bbox_2d\": [679, 723, 852, 752], \"label\": \"Text\"}, {\"bbox_2d\": [679, 761, 840, 790], \"label\": \"Text\"}, {\"bbox_2d\": [679, 799, 906, 843], \"label\": \"Text\"}, {\"bbox_2d\": [679, 853, 846, 881], \"label\": \"Text\"}, {\"bbox_2d\": [377, 799, 580, 828], \"label\": \"Text\"}, {\"bbox_2d\": [377, 837, 559, 865], \"label\": \"Text\"}, {\"bbox_2d\": [76, 761, 279, 805], \"label\": \"Text\"}, {\"bbox_2d\": [76, 815, 237, 843], \"label\": \"Text\"}, {\"bbox_2d\": [76, 853, 249, 881], \"label\": \"Text\"}, {\"bbox_2d\": [76, 999, 420, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [68, 689, 329, 708], \"label\": \"Section-header\"}, {\"bbox_2d\": [68, 493, 158, 511], \"label\": \"Section-header\"}, {\"bbox_2d\": [678, 257, 826, 269], \"label\": \"Section-header\"}, {\"bbox_2d\": [678, 272, 851, 300], \"label\": \"List-item\"}, {\"bbox_2d\": [678, 310, 870, 338], \"label\": \"List-item\"}, {\"bbox_2d\": [678, 348, 817, 376], \"label\": \"List-item\"}, {\"bbox_2d\": [678, 386, 908, 414], \"label\": \"List-item\"}, {\"bbox_2d\": [693, 418, 832, 430], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 448, 911, 604], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 634, 911, 786], \"label\": \"Text\"}, {\"bbox_2d\": [121, 797, 911, 950], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 93, 903, 415], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [128, 90, 899, 423], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 449, 911, 604], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 635, 911, 786], \"label\": \"Text\"}, {\"bbox_2d\": [121, 800, 911, 950], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [937, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 815, 84], \"label\": \"Text\"}, {\"bbox_2d\": [536, 111, 951, 204], \"label\": \"Text\"}, {\"bbox_2d\": [121, 96, 448, 364], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 398, 465, 663], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 681, 506, 720], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 758, 506, 896], \"label\": \"Text\"}, {\"bbox_2d\": [91, 903, 506, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 263, 951, 293], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 295, 951, 334], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 336, 951, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 367, 951, 447], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 449, 951, 469], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 471, 951, 491], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 493, 951, 523], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 525, 951, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 566, 951, 618], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 620, 951, 690], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 692, 951, 722], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 724, 951, 753], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 755, 951, 774], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 776, 951, 796], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 798, 951, 817], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 819, 951, 849], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 851, 951, 896], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 898, 951, 928], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 930, 951, 948], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 704, 951, 722], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 681, 951, 704], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 661, 951, 681], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 564, 951, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 541, 951, 561], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 525, 951, 541], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 471, 951, 491], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 449, 951, 469], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 415, 951, 435], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 374, 951, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 316, 951, 374], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 295, 951, 315], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [113, 91, 461, 661], \"label\": \"Picture\"}, {\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 816, 84], \"label\": \"Text\"}, {\"bbox_2d\": [536, 113, 951, 204], \"label\": \"Text\"}, {\"bbox_2d\": [91, 680, 506, 719], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 758, 506, 896], \"label\": \"Text\"}, {\"bbox_2d\": [91, 902, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [544, 264, 951, 290], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 291, 951, 330], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 332, 939, 344], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 346, 951, 372], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 374, 951, 440], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 442, 951, 467], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 469, 951, 495], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 497, 948, 508], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 510, 951, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 538, 951, 563], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 565, 934, 577], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 579, 951, 604], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 606, 951, 632], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 633, 951, 700], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 702, 951, 727], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 729, 862, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 743, 951, 768], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 770, 951, 796], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 798, 951, 823], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 825, 951, 851], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 852, 951, 864], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 866, 951, 892], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 894, 951, 919], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 921, 951, 946], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 83, 875, 113], \"label\": \"Text\"}, {\"bbox_2d\": [293, 125, 665, 135], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 139, 934, 657], \"label\": \"Table\"}, {\"bbox_2d\": [293, 686, 878, 714], \"label\": \"Text\"}, {\"bbox_2d\": [644, 979, 932, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [462, 542, 645, 554], \"label\": \"Section-header\"}, {\"bbox_2d\": [462, 446, 934, 529], \"label\": \"Text\"}, {\"bbox_2d\": [462, 348, 916, 429], \"label\": \"Text\"}, {\"bbox_2d\": [462, 265, 935, 332], \"label\": \"Text\"}, {\"bbox_2d\": [462, 166, 924, 192], \"label\": \"Text\"}, {\"bbox_2d\": [462, 195, 935, 250], \"label\": \"Text\"}, {\"bbox_2d\": [462, 564, 550, 576], \"label\": \"Text\"}, {\"bbox_2d\": [462, 588, 550, 599], \"label\": \"Text\"}, {\"bbox_2d\": [462, 625, 928, 652], \"label\": \"Text\"}, {\"bbox_2d\": [462, 610, 553, 621], \"label\": \"Text\"}, {\"bbox_2d\": [462, 541, 645, 554], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 875, 114], \"label\": \"Text\"}, {\"bbox_2d\": [293, 126, 666, 136], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 137, 931, 656], \"label\": \"Table\"}, {\"bbox_2d\": [293, 685, 877, 714], \"label\": \"List-item\"}, {\"bbox_2d\": [644, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [775, 68, 974, 95], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 994, 563, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [574, 928, 985, 980], \"label\": \"Text\"}, {\"bbox_2d\": [121, 827, 535, 982], \"label\": \"Text\"}, {\"bbox_2d\": [574, 694, 984, 859], \"label\": \"Text\"}, {\"bbox_2d\": [118, 113, 534, 279], \"label\": \"Text\"}, {\"bbox_2d\": [567, 116, 983, 392], \"label\": \"Text\"}, {\"bbox_2d\": [571, 396, 983, 541], \"label\": \"Text\"}, {\"bbox_2d\": [118, 283, 534, 821], \"label\": \"Text\"}, {\"bbox_2d\": [573, 545, 983, 690], \"label\": \"Text\"}, {\"bbox_2d\": [574, 894, 873, 905], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [775, 67, 990, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [566, 113, 996, 393], \"label\": \"Text\"}, {\"bbox_2d\": [116, 112, 538, 279], \"label\": \"Text\"}, {\"bbox_2d\": [118, 280, 547, 823], \"label\": \"Text\"}, {\"bbox_2d\": [570, 394, 992, 542], \"label\": \"Text\"}, {\"bbox_2d\": [571, 543, 995, 692], \"label\": \"Text\"}, {\"bbox_2d\": [572, 692, 998, 860], \"label\": \"Text\"}, {\"bbox_2d\": [120, 825, 544, 982], \"label\": \"Text\"}, {\"bbox_2d\": [575, 926, 992, 982], \"label\": \"Text\"}, {\"bbox_2d\": [575, 891, 887, 909], \"label\": \"Section-header\"}, {\"bbox_2d\": [538, 991, 566, 1006], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [121, 149, 911, 220], \"label\": \"Text\"}, {\"bbox_2d\": [121, 231, 911, 576], \"label\": \"Text\"}, {\"bbox_2d\": [121, 587, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 910, 138], \"label\": \"Text\"}, {\"bbox_2d\": [121, 150, 911, 220], \"label\": \"Text\"}, {\"bbox_2d\": [121, 232, 911, 576], \"label\": \"Text\"}, {\"bbox_2d\": [121, 588, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [292, 84, 753, 99], \"label\": \"Caption\"}, {\"bbox_2d\": [592, 979, 932, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [291, 97, 933, 885], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [293, 86, 755, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 98, 931, 884], \"label\": \"Table\"}, {\"bbox_2d\": [593, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [198, 67, 452, 101], \"label\": \"Section-header\"}, {\"bbox_2d\": [198, 916, 907, 930], \"label\": \"Text\"}, {\"bbox_2d\": [198, 139, 944, 888], \"label\": \"Table\"}, {\"bbox_2d\": [112, 947, 131, 962], \"label\": \"Page-footer\"}, {\"bbox_2d\": [38, 958, 146, 1017], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [198, 69, 453, 100], \"label\": \"Section-header\"}, {\"bbox_2d\": [199, 917, 907, 929], \"label\": \"Text\"}, {\"bbox_2d\": [193, 137, 952, 495], \"label\": \"Table\"}, {\"bbox_2d\": [196, 518, 947, 615], \"label\": \"Table\"}, {\"bbox_2d\": [196, 637, 949, 771], \"label\": \"Table\"}, {\"bbox_2d\": [195, 791, 947, 890], \"label\": \"Table\"}, {\"bbox_2d\": [113, 948, 133, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [39, 958, 147, 1020], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 123, 635, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 306, 732, 323], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 165, 901, 194], \"label\": \"Text\"}, {\"bbox_2d\": [293, 336, 900, 380], \"label\": \"Text\"}, {\"bbox_2d\": [316, 391, 846, 661], \"label\": \"Picture\"}, {\"bbox_2d\": [904, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 201, 918, 245], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 252, 720, 264], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 271, 622, 283], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 636, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 307, 732, 324], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 166, 900, 195], \"label\": \"Text\"}, {\"bbox_2d\": [293, 199, 917, 246], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 250, 720, 265], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 269, 621, 285], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 335, 898, 380], \"label\": \"Text\"}, {\"bbox_2d\": [493, 391, 662, 403], \"label\": \"Section-header\"}, {\"bbox_2d\": [317, 415, 844, 661], \"label\": \"Text\"}, {\"bbox_2d\": [904, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [81, 45, 329, 237], \"label\": \"Picture\"}, {\"bbox_2d\": [81, 382, 657, 440], \"label\": \"Text\"}, {\"bbox_2d\": [81, 464, 345, 602], \"label\": \"Text\"}, {\"bbox_2d\": [81, 613, 345, 797], \"label\": \"Text\"}, {\"bbox_2d\": [81, 808, 345, 976], \"label\": \"Text\"}, {\"bbox_2d\": [389, 464, 654, 618], \"label\": \"Text\"}, {\"bbox_2d\": [389, 629, 655, 891], \"label\": \"Text\"}, {\"bbox_2d\": [698, 382, 958, 505], \"label\": \"Text\"}, {\"bbox_2d\": [698, 516, 954, 574], \"label\": \"Text\"}, {\"bbox_2d\": [698, 585, 938, 613], \"label\": \"Text\"}, {\"bbox_2d\": [389, 93, 889, 175], \"label\": \"Text\"}, {\"bbox_2d\": [389, 187, 497, 201], \"label\": \"Caption\"}, {\"bbox_2d\": [697, 726, 962, 765], \"label\": \"Caption\"}, {\"bbox_2d\": [647, 998, 974, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [579, 313, 973, 330], \"label\": \"Section-header\"}, {\"bbox_2d\": [696, 633, 878, 711], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [377, 93, 891, 173], \"label\": \"Text\"}, {\"bbox_2d\": [390, 190, 499, 200], \"label\": \"Caption\"}, {\"bbox_2d\": [83, 46, 329, 236], \"label\": \"Picture\"}, {\"bbox_2d\": [83, 465, 348, 601], \"label\": \"Text\"}, {\"bbox_2d\": [83, 613, 349, 796], \"label\": \"Text\"}, {\"bbox_2d\": [83, 808, 350, 976], \"label\": \"Text\"}, {\"bbox_2d\": [390, 465, 657, 617], \"label\": \"Text\"}, {\"bbox_2d\": [390, 629, 659, 891], \"label\": \"Text\"}, {\"bbox_2d\": [647, 1000, 975, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [697, 728, 962, 764], \"label\": \"Caption\"}, {\"bbox_2d\": [697, 384, 962, 504], \"label\": \"Text\"}, {\"bbox_2d\": [697, 516, 956, 574], \"label\": \"Text\"}, {\"bbox_2d\": [697, 585, 943, 612], \"label\": \"Text\"}, {\"bbox_2d\": [693, 634, 881, 719], \"label\": \"Picture\"}, {\"bbox_2d\": [577, 314, 979, 330], \"label\": \"Section-header\"}, {\"bbox_2d\": [83, 383, 656, 439], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [49, 68, 255, 101], \"label\": \"Page-header\"}, {\"bbox_2d\": [49, 111, 461, 186], \"label\": \"Text\"}, {\"bbox_2d\": [49, 187, 462, 374], \"label\": \"Text\"}, {\"bbox_2d\": [49, 375, 461, 505], \"label\": \"Text\"}, {\"bbox_2d\": [49, 506, 462, 636], \"label\": \"Text\"}, {\"bbox_2d\": [49, 637, 464, 871], \"label\": \"Text\"}, {\"bbox_2d\": [49, 872, 464, 973], \"label\": \"Text\"}, {\"bbox_2d\": [500, 109, 918, 448], \"label\": \"Text\"}, {\"bbox_2d\": [500, 448, 921, 910], \"label\": \"Text\"}, {\"bbox_2d\": [500, 911, 917, 972], \"label\": \"Text\"}, {\"bbox_2d\": [473, 994, 502, 1012], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [52, 68, 269, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [500, 109, 928, 446], \"label\": \"Text\"}, {\"bbox_2d\": [48, 111, 468, 184], \"label\": \"Text\"}, {\"bbox_2d\": [49, 185, 470, 371], \"label\": \"Text\"}, {\"bbox_2d\": [50, 372, 471, 502], \"label\": \"Text\"}, {\"bbox_2d\": [50, 503, 473, 633], \"label\": \"Text\"}, {\"bbox_2d\": [503, 447, 931, 907], \"label\": \"Text\"}, {\"bbox_2d\": [50, 634, 484, 866], \"label\": \"Text\"}, {\"bbox_2d\": [51, 870, 477, 968], \"label\": \"Text\"}, {\"bbox_2d\": [505, 910, 929, 968], \"label\": \"Text\"}, {\"bbox_2d\": [473, 990, 518, 1008], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [292, 86, 932, 707], \"label\": \"Table\"}, {\"bbox_2d\": [292, 734, 926, 765], \"label\": \"Text\"}, {\"bbox_2d\": [703, 977, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [293, 737, 923, 765], \"label\": \"Text\"}, {\"bbox_2d\": [292, 84, 932, 707], \"label\": \"Table\"}, {\"bbox_2d\": [703, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [52, 167, 985, 325], \"label\": \"Picture\"}, {\"bbox_2d\": [0, 340, 714, 788], \"label\": \"Picture\"}, {\"bbox_2d\": [52, 823, 269, 980], \"label\": \"Picture\"}, {\"bbox_2d\": [293, 823, 506, 980], \"label\": \"Picture\"}, {\"bbox_2d\": [531, 823, 746, 980], \"label\": \"Picture\"}, {\"bbox_2d\": [772, 823, 985, 980], \"label\": \"Picture\"}, {\"bbox_2d\": [811, 999, 994, 1017], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [0, 340, 986, 806], \"label\": \"Picture\"}, {\"bbox_2d\": [51, 187, 268, 326], \"label\": \"Picture\"}, {\"bbox_2d\": [292, 185, 508, 324], \"label\": \"Picture\"}, {\"bbox_2d\": [532, 186, 748, 325], \"label\": \"Picture\"}, {\"bbox_2d\": [771, 188, 985, 324], \"label\": \"Picture\"}, {\"bbox_2d\": [780, 173, 881, 184], \"label\": \"Caption\"}, {\"bbox_2d\": [540, 173, 655, 184], \"label\": \"Caption\"}, {\"bbox_2d\": [301, 173, 501, 184], \"label\": \"Caption\"}, {\"bbox_2d\": [61, 173, 133, 184], \"label\": \"Caption\"}, {\"bbox_2d\": [61, 828, 161, 838], \"label\": \"Caption\"}, {\"bbox_2d\": [301, 828, 426, 838], \"label\": \"Caption\"}, {\"bbox_2d\": [540, 828, 695, 838], \"label\": \"Caption\"}, {\"bbox_2d\": [780, 828, 875, 838], \"label\": \"Caption\"}, {\"bbox_2d\": [53, 842, 268, 980], \"label\": \"Picture\"}, {\"bbox_2d\": [293, 841, 509, 980], \"label\": \"Picture\"}, {\"bbox_2d\": [532, 840, 749, 980], \"label\": \"Picture\"}, {\"bbox_2d\": [773, 841, 989, 981], \"label\": \"Picture\"}, {\"bbox_2d\": [865, 1003, 999, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [234, 89, 790, 463], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 501, 911, 630], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 658, 911, 703], \"label\": \"Text\"}, {\"bbox_2d\": [121, 716, 911, 948], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [239, 87, 792, 460], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 501, 911, 629], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 659, 910, 702], \"label\": \"Text\"}, {\"bbox_2d\": [121, 714, 911, 949], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [122, 46, 708, 113], \"label\": \"Text\"}, {\"bbox_2d\": [122, 125, 710, 179], \"label\": \"Text\"}, {\"bbox_2d\": [122, 205, 712, 259], \"label\": \"Text\"}, {\"bbox_2d\": [122, 285, 708, 337], \"label\": \"Text\"}, {\"bbox_2d\": [122, 349, 712, 389], \"label\": \"Text\"}, {\"bbox_2d\": [122, 415, 682, 441], \"label\": \"Text\"}, {\"bbox_2d\": [122, 452, 350, 465], \"label\": \"Text\"}, {\"bbox_2d\": [122, 556, 714, 595], \"label\": \"Text\"}, {\"bbox_2d\": [122, 621, 712, 689], \"label\": \"Text\"}, {\"bbox_2d\": [122, 701, 708, 755], \"label\": \"Text\"}, {\"bbox_2d\": [122, 766, 224, 779], \"label\": \"Text\"}, {\"bbox_2d\": [122, 849, 187, 864], \"label\": \"Caption\"}, {\"bbox_2d\": [934, 988, 944, 1000], \"label\": \"Page-footer\"}, {\"bbox_2d\": [110, 776, 339, 844], \"label\": \"Picture\"}, {\"bbox_2d\": [122, 609, 320, 622], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 400, 301, 413], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 272, 310, 285], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 190, 292, 203], \"label\": \"Section-header\"}, {\"bbox_2d\": [154, 477, 434, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 490, 375, 502], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 503, 419, 516], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 517, 627, 530], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 531, 356, 544], \"label\": \"List-item\"}, {\"bbox_2d\": [746, 85, 937, 308], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [746, 85, 943, 309], \"label\": \"Text\"}, {\"bbox_2d\": [121, 47, 706, 113], \"label\": \"Text\"}, {\"bbox_2d\": [121, 126, 709, 179], \"label\": \"Text\"}, {\"bbox_2d\": [121, 205, 711, 258], \"label\": \"Text\"}, {\"bbox_2d\": [121, 284, 713, 337], \"label\": \"Text\"}, {\"bbox_2d\": [121, 349, 711, 389], \"label\": \"Text\"}, {\"bbox_2d\": [121, 415, 686, 440], \"label\": \"Text\"}, {\"bbox_2d\": [121, 453, 348, 465], \"label\": \"Text\"}, {\"bbox_2d\": [121, 191, 296, 203], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 270, 310, 282], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 401, 301, 413], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 608, 320, 620], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 556, 714, 595], \"label\": \"Text\"}, {\"bbox_2d\": [121, 621, 711, 688], \"label\": \"Text\"}, {\"bbox_2d\": [121, 701, 708, 754], \"label\": \"Text\"}, {\"bbox_2d\": [121, 766, 223, 778], \"label\": \"Text\"}, {\"bbox_2d\": [121, 839, 218, 864], \"label\": \"Caption\"}, {\"bbox_2d\": [152, 477, 434, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 491, 374, 502], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 504, 418, 516], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 518, 626, 530], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 532, 355, 544], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 782, 335, 835], \"label\": \"Picture\"}, {\"bbox_2d\": [934, 990, 942, 998], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [101, 83, 464, 125], \"label\": \"Text\"}, {\"bbox_2d\": [101, 138, 486, 180], \"label\": \"Text\"}, {\"bbox_2d\": [101, 194, 467, 221], \"label\": \"Text\"}, {\"bbox_2d\": [101, 234, 473, 304], \"label\": \"Text\"}, {\"bbox_2d\": [101, 318, 459, 359], \"label\": \"Text\"}, {\"bbox_2d\": [101, 373, 472, 457], \"label\": \"Text\"}, {\"bbox_2d\": [101, 471, 474, 513], \"label\": \"Text\"}, {\"bbox_2d\": [101, 527, 484, 608], \"label\": \"Text\"}, {\"bbox_2d\": [101, 622, 417, 650], \"label\": \"Text\"}, {\"bbox_2d\": [101, 664, 474, 705], \"label\": \"Text\"}, {\"bbox_2d\": [101, 719, 485, 760], \"label\": \"Text\"}, {\"bbox_2d\": [101, 774, 485, 843], \"label\": \"Text\"}, {\"bbox_2d\": [101, 857, 441, 884], \"label\": \"Text\"}, {\"bbox_2d\": [101, 898, 485, 952], \"label\": \"Text\"}, {\"bbox_2d\": [101, 980, 378, 992], \"label\": \"Page-footer\"}, {\"bbox_2d\": [526, 85, 885, 125], \"label\": \"Text\"}, {\"bbox_2d\": [526, 139, 911, 165], \"label\": \"Text\"}, {\"bbox_2d\": [526, 179, 902, 206], \"label\": \"Text\"}, {\"bbox_2d\": [526, 220, 901, 289], \"label\": \"Text\"}, {\"bbox_2d\": [526, 303, 878, 344], \"label\": \"Text\"}, {\"bbox_2d\": [526, 358, 902, 386], \"label\": \"Text\"}, {\"bbox_2d\": [526, 399, 905, 468], \"label\": \"Text\"}, {\"bbox_2d\": [526, 482, 834, 494], \"label\": \"Text\"}, {\"bbox_2d\": [526, 508, 890, 562], \"label\": \"Text\"}, {\"bbox_2d\": [526, 577, 906, 647], \"label\": \"Text\"}, {\"bbox_2d\": [526, 661, 912, 730], \"label\": \"Text\"}, {\"bbox_2d\": [526, 744, 857, 784], \"label\": \"Text\"}, {\"bbox_2d\": [526, 798, 893, 840], \"label\": \"Text\"}, {\"bbox_2d\": [526, 854, 902, 908], \"label\": \"Text\"}, {\"bbox_2d\": [526, 922, 876, 952], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [101, 85, 465, 125], \"label\": \"Text\"}, {\"bbox_2d\": [101, 140, 486, 180], \"label\": \"Text\"}, {\"bbox_2d\": [101, 194, 467, 221], \"label\": \"Text\"}, {\"bbox_2d\": [101, 235, 485, 304], \"label\": \"Text\"}, {\"bbox_2d\": [101, 318, 459, 359], \"label\": \"Text\"}, {\"bbox_2d\": [101, 373, 474, 457], \"label\": \"Text\"}, {\"bbox_2d\": [101, 471, 475, 512], \"label\": \"Text\"}, {\"bbox_2d\": [101, 526, 482, 610], \"label\": \"Text\"}, {\"bbox_2d\": [101, 624, 419, 650], \"label\": \"Text\"}, {\"bbox_2d\": [101, 664, 473, 705], \"label\": \"Text\"}, {\"bbox_2d\": [101, 719, 485, 760], \"label\": \"Text\"}, {\"bbox_2d\": [101, 774, 485, 843], \"label\": \"Text\"}, {\"bbox_2d\": [101, 858, 441, 884], \"label\": \"Text\"}, {\"bbox_2d\": [101, 898, 486, 953], \"label\": \"Text\"}, {\"bbox_2d\": [526, 85, 884, 125], \"label\": \"Text\"}, {\"bbox_2d\": [526, 140, 911, 166], \"label\": \"Text\"}, {\"bbox_2d\": [526, 180, 901, 206], \"label\": \"Text\"}, {\"bbox_2d\": [526, 221, 900, 290], \"label\": \"Text\"}, {\"bbox_2d\": [526, 304, 878, 345], \"label\": \"Text\"}, {\"bbox_2d\": [526, 359, 902, 385], \"label\": \"Text\"}, {\"bbox_2d\": [526, 399, 905, 469], \"label\": \"Text\"}, {\"bbox_2d\": [526, 483, 834, 495], \"label\": \"Text\"}, {\"bbox_2d\": [526, 509, 888, 564], \"label\": \"Text\"}, {\"bbox_2d\": [526, 578, 906, 648], \"label\": \"Text\"}, {\"bbox_2d\": [526, 800, 892, 841], \"label\": \"Text\"}, {\"bbox_2d\": [526, 855, 902, 910], \"label\": \"Text\"}, {\"bbox_2d\": [526, 924, 875, 950], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [526, 662, 911, 731], \"label\": \"Text\"}, {\"bbox_2d\": [526, 745, 879, 786], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 98, 914, 134], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 164, 914, 206], \"label\": \"Text\"}, {\"bbox_2d\": [121, 270, 494, 288], \"label\": \"Text\"}, {\"bbox_2d\": [121, 389, 914, 433], \"label\": \"Text\"}, {\"bbox_2d\": [121, 467, 914, 510], \"label\": \"Text\"}, {\"bbox_2d\": [121, 575, 914, 618], \"label\": \"Text\"}, {\"bbox_2d\": [121, 632, 914, 645], \"label\": \"Text\"}, {\"bbox_2d\": [121, 740, 368, 757], \"label\": \"Text\"}, {\"bbox_2d\": [121, 794, 312, 806], \"label\": \"Text\"}, {\"bbox_2d\": [121, 900, 408, 917], \"label\": \"Text\"}, {\"bbox_2d\": [502, 989, 511, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [139, 932, 914, 975], \"label\": \"Formula\"}, {\"bbox_2d\": [271, 853, 914, 893], \"label\": \"Formula\"}, {\"bbox_2d\": [199, 768, 914, 815], \"label\": \"Formula\"}, {\"bbox_2d\": [201, 666, 914, 709], \"label\": \"Formula\"}, {\"bbox_2d\": [464, 556, 914, 585], \"label\": \"Formula\"}, {\"bbox_2d\": [234, 446, 914, 489], \"label\": \"Formula\"}, {\"bbox_2d\": [169, 277, 914, 381], \"label\": \"Formula\"}, {\"bbox_2d\": [270, 221, 914, 258], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 98, 911, 133], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 165, 911, 205], \"label\": \"Text\"}, {\"bbox_2d\": [121, 271, 495, 286], \"label\": \"Text\"}, {\"bbox_2d\": [121, 390, 911, 431], \"label\": \"Text\"}, {\"bbox_2d\": [121, 496, 911, 537], \"label\": \"Text\"}, {\"bbox_2d\": [121, 603, 911, 644], \"label\": \"Text\"}, {\"bbox_2d\": [121, 741, 481, 755], \"label\": \"Text\"}, {\"bbox_2d\": [121, 821, 425, 835], \"label\": \"Text\"}, {\"bbox_2d\": [147, 901, 545, 915], \"label\": \"Text\"}, {\"bbox_2d\": [270, 218, 910, 265], \"label\": \"Formula\"}, {\"bbox_2d\": [197, 298, 910, 383], \"label\": \"Formula\"}, {\"bbox_2d\": [232, 442, 910, 490], \"label\": \"Formula\"}, {\"bbox_2d\": [463, 554, 910, 587], \"label\": \"Formula\"}, {\"bbox_2d\": [175, 666, 910, 742], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 771, 910, 805], \"label\": \"Formula\"}, {\"bbox_2d\": [267, 851, 910, 885], \"label\": \"Formula\"}, {\"bbox_2d\": [164, 927, 910, 974], \"label\": \"Formula\"}, {\"bbox_2d\": [502, 989, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 24, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 75, 1009, 585], \"label\": \"Table\"}, {\"bbox_2d\": [504, 615, 529, 625], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [23, 72, 1011, 588], \"label\": \"Table\"}, {\"bbox_2d\": [504, 615, 529, 626], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 99, 446, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 126, 157, 137], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 139, 508, 237], \"label\": \"Text\"}, {\"bbox_2d\": [91, 257, 777, 267], \"label\": \"Text\"}, {\"bbox_2d\": [91, 443, 508, 483], \"label\": \"Text\"}, {\"bbox_2d\": [528, 443, 945, 483], \"label\": \"Text\"}, {\"bbox_2d\": [91, 601, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 640, 508, 693], \"label\": \"Text\"}, {\"bbox_2d\": [91, 859, 508, 962], \"label\": \"Text\"}, {\"bbox_2d\": [91, 588, 284, 599], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 496, 400, 566], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [91, 283, 946, 422], \"label\": \"Table\"}, {\"bbox_2d\": [91, 705, 508, 848], \"label\": \"Table\"}, {\"bbox_2d\": [590, 999, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 849, 508, 962], \"label\": \"Footnote\"}, {\"bbox_2d\": [545, 129, 833, 140], \"label\": \"Section-header\"}, {\"bbox_2d\": [543, 162, 899, 228], \"label\": \"Table\"}, {\"bbox_2d\": [91, 142, 508, 238], \"label\": \"Text\"}, {\"bbox_2d\": [91, 601, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 443, 508, 483], \"label\": \"Text\"}, {\"bbox_2d\": [91, 601, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 588, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 640, 508, 693], \"label\": \"Text\"}, {\"bbox_2d\": [91, 496, 508, 566], \"label\": \"Table\"}, {\"bbox_2d\": [91, 283, 508, 422], \"label\": \"Table\"}, {\"bbox_2d\": [91, 588, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 640, 508, 693], \"label\": \"Text\"}, {\"bbox_2d\": [91, 496, 508, 566], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [91, 496, 508, 566], \"label\": \"Table\"}, {\"bbox_2d\": [543, 162, 899, 228], \"label\": \"Table\"}, {\"bbox_2d\": [91, 588, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 601, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 640, 508, 693], \"label\": \"Text\"}, {\"bbox_2d\": [91, 496, 508, 566], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [91, 588, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 640, 508, 693], \"label\": \"Text\"}, {\"bbox_2d\": [91, 588, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [543, 162, 899, 228], \"label\": \"Table\"}, {\"bbox_2d\": [91, 283, 508, 422], \"label\": \"Table\"}, {\"bbox_2d\": [91, 496, 508, 566], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [91, 496, 508, 566], \"label\": \"Table\"}, {\"bbox_2d\": [543, 162, 899, 228], \"label\": \"Table\"}, {\"bbox_2d\": [91, 283, 508, 422], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [543, 162, 899, 228], \"label\": \"Table\"}, {\"bbox_2d\": [91, 283, 508, 422], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [91, 496, 508, 566], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [543, 162, 899, 228], \"label\": \"Table\"}, {\"bbox_2d\": [91, 283, 508, 422], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [543, 162, 899, 228], \"label\": \"Table\"}, {\"bbox_2d\": [91, 283, 508, 422], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [543, 162, 899, 228], \"label\": \"Table\"}, {\"bbox_2d\": [91, 283, 508, 422], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [91, 496, 508, 566], \"label\": \"Table\"}, {\"bbox_2d\": [527, 496, 945, 561], \"label\": \"Table\"}, {\"bbox_2d\": [91, 283, 508, 422], \"label\": \"Table", "label": "```json[{\"bbox_2d\": [1002, 94, 1013, 309], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 98, 445, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 126, 156, 136], \"label\": \"Section-header\"}, {\"bbox_2d\": [591, 1000, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [544, 131, 833, 148], \"label\": \"Caption\"}, {\"bbox_2d\": [538, 157, 903, 230], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 140, 507, 237], \"label\": \"Text\"}, {\"bbox_2d\": [91, 258, 776, 268], \"label\": \"Text\"}, {\"bbox_2d\": [91, 443, 507, 481], \"label\": \"Text\"}, {\"bbox_2d\": [528, 443, 944, 481], \"label\": \"Text\"}, {\"bbox_2d\": [88, 276, 948, 426], \"label\": \"Table\"}, {\"bbox_2d\": [89, 492, 511, 570], \"label\": \"Table\"}, {\"bbox_2d\": [526, 492, 944, 559], \"label\": \"Table\"}, {\"bbox_2d\": [91, 589, 284, 599], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 604, 507, 628], \"label\": \"Text\"}, {\"bbox_2d\": [91, 640, 507, 693], \"label\": \"Text\"}, {\"bbox_2d\": [91, 859, 507, 961], \"label\": \"Text\"}, {\"bbox_2d\": [89, 704, 512, 852], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 123, 941, 727], \"label\": \"List-item\"}, {\"bbox_2d\": [504, 988, 514, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [105, 123, 696, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 146, 940, 160], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 169, 940, 206], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 214, 672, 228], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 282, 888, 296], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 237, 940, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 305, 940, 342], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 350, 591, 364], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 373, 940, 410], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 418, 781, 432], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 441, 918, 455], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 464, 727, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 486, 816, 501], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 509, 591, 523], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 532, 940, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 577, 940, 591], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 600, 940, 637], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 645, 876, 659], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 668, 678, 682], \"label\": \"List-item\"}, {\"bbox_2d\": [95, 690, 940, 727], \"label\": \"List-item\"}, {\"bbox_2d\": [503, 989, 513, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [430, 29, 614, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [334, 104, 766, 255], \"label\": \"Picture\"}, {\"bbox_2d\": [202, 299, 881, 352], \"label\": \"Section-header\"}, {\"bbox_2d\": [440, 415, 641, 432], \"label\": \"Section-header\"}, {\"bbox_2d\": [330, 466, 750, 483], \"label\": \"Section-header\"}, {\"bbox_2d\": [180, 517, 902, 532], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [440, 410, 643, 435], \"label\": \"Text\"}, {\"bbox_2d\": [330, 460, 753, 485], \"label\": \"Text\"}, {\"bbox_2d\": [180, 511, 903, 536], \"label\": \"Text\"}, {\"bbox_2d\": [200, 291, 889, 355], \"label\": \"Section-header\"}, {\"bbox_2d\": [0, 604, 1034, 1035], \"label\": \"Picture\"}, {\"bbox_2d\": [334, 96, 775, 260], \"label\": \"Picture\"}, {\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 231, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 350, 911, 393], \"label\": \"Text\"}, {\"bbox_2d\": [121, 406, 911, 666], \"label\": \"Text\"}, {\"bbox_2d\": [121, 784, 911, 962], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 709, 911, 752], \"label\": \"Section-header\"}, {\"bbox_2d\": [174, 123, 911, 337], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 229, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 350, 911, 393], \"label\": \"Text\"}, {\"bbox_2d\": [121, 404, 911, 666], \"label\": \"Text\"}, {\"bbox_2d\": [121, 783, 911, 963], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 710, 911, 752], \"label\": \"Section-header\"}, {\"bbox_2d\": [175, 113, 911, 198], \"label\": \"Formula\"}, {\"bbox_2d\": [166, 166, 911, 265], \"label\": \"Formula\"}, {\"bbox_2d\": [159, 247, 911, 351], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [111, 66, 934, 82], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 101, 842, 119], \"label\": \"Text\"}, {\"bbox_2d\": [168, 184, 889, 201], \"label\": \"Text\"}, {\"bbox_2d\": [169, 214, 555, 231], \"label\": \"Text\"}, {\"bbox_2d\": [170, 243, 722, 259], \"label\": \"Text\"}, {\"bbox_2d\": [170, 271, 606, 287], \"label\": \"Text\"}, {\"bbox_2d\": [171, 299, 605, 315], \"label\": \"Text\"}, {\"bbox_2d\": [121, 327, 910, 371], \"label\": \"Text\"}, {\"bbox_2d\": [171, 382, 675, 398], \"label\": \"Text\"}, {\"bbox_2d\": [171, 410, 700, 426], \"label\": \"Text\"}, {\"bbox_2d\": [121, 438, 910, 482], \"label\": \"Text\"}, {\"bbox_2d\": [171, 494, 772, 510], \"label\": \"Text\"}, {\"bbox_2d\": [121, 522, 910, 566], \"label\": \"Text\"}, {\"bbox_2d\": [171, 578, 844, 594], \"label\": \"Text\"}, {\"bbox_2d\": [171, 633, 315, 650], \"label\": \"Text\"}, {\"bbox_2d\": [171, 689, 250, 705], \"label\": \"Text\"}, {\"bbox_2d\": [171, 718, 582, 734], \"label\": \"Text\"}, {\"bbox_2d\": [121, 742, 910, 818], \"label\": \"Text\"}, {\"bbox_2d\": [121, 827, 910, 959], \"label\": \"Text\"}, {\"bbox_2d\": [513, 980, 529, 989], \"label\": \"Page-footer\"}, {\"bbox_2d\": [168, 158, 263, 175], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [110, 69, 203, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [749, 69, 931, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [120, 102, 840, 118], \"label\": \"Text\"}, {\"bbox_2d\": [170, 158, 259, 173], \"label\": \"Text\"}, {\"bbox_2d\": [171, 184, 888, 201], \"label\": \"Text\"}, {\"bbox_2d\": [172, 213, 554, 229], \"label\": \"Text\"}, {\"bbox_2d\": [172, 241, 722, 257], \"label\": \"Text\"}, {\"bbox_2d\": [172, 268, 602, 286], \"label\": \"Text\"}, {\"bbox_2d\": [173, 297, 602, 314], \"label\": \"Text\"}, {\"bbox_2d\": [123, 324, 910, 370], \"label\": \"Text\"}, {\"bbox_2d\": [173, 380, 675, 397], \"label\": \"Text\"}, {\"bbox_2d\": [174, 407, 699, 425], \"label\": \"Text\"}, {\"bbox_2d\": [124, 435, 910, 481], \"label\": \"Text\"}, {\"bbox_2d\": [174, 491, 770, 508], \"label\": \"Text\"}, {\"bbox_2d\": [127, 519, 910, 565], \"label\": \"Text\"}, {\"bbox_2d\": [174, 575, 842, 592], \"label\": \"Text\"}, {\"bbox_2d\": [173, 632, 313, 648], \"label\": \"Text\"}, {\"bbox_2d\": [173, 687, 249, 704], \"label\": \"Text\"}, {\"bbox_2d\": [127, 715, 910, 816], \"label\": \"Text\"}, {\"bbox_2d\": [127, 825, 912, 956], \"label\": \"Text\"}, {\"bbox_2d\": [513, 981, 528, 989], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [62, 44, 970, 77], \"label\": \"Text\"}, {\"bbox_2d\": [62, 92, 760, 139], \"label\": \"Text\"}, {\"bbox_2d\": [62, 155, 722, 171], \"label\": \"Text\"}, {\"bbox_2d\": [62, 187, 753, 280], \"label\": \"Text\"}, {\"bbox_2d\": [62, 812, 973, 922], \"label\": \"Text\"}, {\"bbox_2d\": [62, 941, 975, 972], \"label\": \"Text\"}, {\"bbox_2d\": [31, 401, 53, 634], \"label\": \"Page-header\"}, {\"bbox_2d\": [64, 297, 977, 798], \"label\": \"Table\"}, {\"bbox_2d\": [770, 103, 968, 254], \"label\": \"Picture\"}, {\"bbox_2d\": [508, 994, 527, 1008], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [62, 43, 969, 77], \"label\": \"Text\"}, {\"bbox_2d\": [62, 91, 760, 140], \"label\": \"Text\"}, {\"bbox_2d\": [62, 153, 720, 171], \"label\": \"Text\"}, {\"bbox_2d\": [62, 185, 756, 281], \"label\": \"Text\"}, {\"bbox_2d\": [62, 810, 979, 923], \"label\": \"Text\"}, {\"bbox_2d\": [62, 939, 976, 973], \"label\": \"Text\"}, {\"bbox_2d\": [508, 992, 527, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [26, 401, 45, 636], \"label\": \"Page-header\"}, {\"bbox_2d\": [762, 103, 970, 251], \"label\": \"Picture\"}, {\"bbox_2d\": [60, 294, 979, 798], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 911, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 154, 911, 196], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 211, 911, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 268, 911, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 324, 911, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 380, 911, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 438, 911, 479], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 495, 911, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 551, 911, 620], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 636, 911, 677], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 693, 911, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 749, 911, 791], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 806, 911, 848], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 864, 911, 905], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 97, 911, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 154, 911, 195], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 210, 911, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 267, 911, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 324, 911, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 380, 911, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 437, 911, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 494, 911, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 551, 911, 620], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 635, 911, 677], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 692, 911, 733], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 749, 911, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 806, 911, 847], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 862, 911, 903], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [52, 53, 646, 77], \"label\": \"Section-header\"}, {\"bbox_2d\": [52, 84, 285, 94], \"label\": \"Text\"}, {\"bbox_2d\": [52, 137, 362, 149], \"label\": \"Section-header\"}, {\"bbox_2d\": [52, 160, 281, 171], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 179, 956, 219], \"label\": \"Text\"}, {\"bbox_2d\": [76, 227, 962, 250], \"label\": \"Text\"}, {\"bbox_2d\": [76, 258, 982, 348], \"label\": \"Text\"}, {\"bbox_2d\": [76, 356, 969, 380], \"label\": \"Text\"}, {\"bbox_2d\": [52, 387, 974, 476], \"label\": \"Table\"}, {\"bbox_2d\": [33, 1003, 220, 1014], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [51, 139, 363, 150], \"label\": \"Section-header\"}, {\"bbox_2d\": [51, 160, 281, 171], \"label\": \"Section-header\"}, {\"bbox_2d\": [51, 56, 646, 94], \"label\": \"Page-header\"}, {\"bbox_2d\": [77, 180, 957, 217], \"label\": \"Text\"}, {\"bbox_2d\": [77, 226, 963, 250], \"label\": \"Text\"}, {\"bbox_2d\": [77, 258, 982, 348], \"label\": \"Text\"}, {\"bbox_2d\": [77, 357, 971, 381], \"label\": \"Text\"}, {\"bbox_2d\": [50, 387, 981, 476], \"label\": \"Table\"}, {\"bbox_2d\": [33, 1003, 46, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [117, 1004, 220, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [49, 997, 114, 1018], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 89, 567, 104], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 127, 911, 336], \"label\": \"Text\"}, {\"bbox_2d\": [125, 348, 911, 483], \"label\": \"Text\"}, {\"bbox_2d\": [150, 466, 505, 483], \"label\": \"Text\"}, {\"bbox_2d\": [125, 525, 353, 543], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 559, 562, 587], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 593, 337, 611], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 630, 412, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 660, 410, 678], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 696, 352, 714], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 732, 352, 749], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 765, 911, 880], \"label\": \"Text\"}, {\"bbox_2d\": [125, 888, 911, 951], \"label\": \"Text\"}, {\"bbox_2d\": [516, 974, 526, 988], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [125, 88, 568, 104], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 128, 911, 345], \"label\": \"Text\"}, {\"bbox_2d\": [125, 348, 911, 492], \"label\": \"Text\"}, {\"bbox_2d\": [151, 496, 617, 508], \"label\": \"Text\"}, {\"bbox_2d\": [125, 763, 911, 885], \"label\": \"Text\"}, {\"bbox_2d\": [125, 888, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [125, 529, 354, 550], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 549, 562, 595], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 588, 299, 633], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 631, 411, 652], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 662, 411, 686], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 698, 349, 720], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 730, 349, 754], \"label\": \"List-item\"}, {\"bbox_2d\": [513, 974, 522, 987], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 926, 136], \"label\": \"Text\"}, {\"bbox_2d\": [231, 199, 923, 257], \"label\": \"Text\"}, {\"bbox_2d\": [231, 298, 691, 310], \"label\": \"Text\"}, {\"bbox_2d\": [231, 502, 692, 514], \"label\": \"Text\"}, {\"bbox_2d\": [231, 700, 926, 788], \"label\": \"Text\"}, {\"bbox_2d\": [241, 813, 916, 872], \"label\": \"Text\"}, {\"bbox_2d\": [231, 898, 888, 941], \"label\": \"Text\"}, {\"bbox_2d\": [231, 498, 692, 512], \"label\": \"Text\"}, {\"bbox_2d\": [231, 280, 372, 294], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 482, 353, 496], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 656, 456, 675], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 201, 923, 257], \"label\": \"Text\"}, {\"bbox_2d\": [257, 367, 895, 392], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 404, 662, 416], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 427, 700, 439], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 347, 602, 356], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 322, 586, 333], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 569, 749, 581], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 586, 691, 598], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 525, 729, 537], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 547, 750, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 607, 890, 618], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 988, 692, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 926, 136], \"label\": \"Text\"}, {\"bbox_2d\": [231, 198, 923, 257], \"label\": \"Text\"}, {\"bbox_2d\": [231, 298, 691, 310], \"label\": \"Text\"}, {\"bbox_2d\": [109, 163, 787, 179], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 279, 372, 293], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 321, 587, 333], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 343, 602, 355], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 365, 900, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 403, 663, 415], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 425, 700, 437], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 448, 487, 460], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 523, 730, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 546, 750, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 568, 748, 580], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 584, 692, 596], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 606, 892, 618], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 501, 693, 513], \"label\": \"Text\"}, {\"bbox_2d\": [231, 481, 353, 496], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 655, 457, 674], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 697, 926, 788], \"label\": \"Text\"}, {\"bbox_2d\": [241, 813, 916, 872], \"label\": \"Text\"}, {\"bbox_2d\": [231, 898, 887, 941], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [286, 24, 612, 37], \"label\": \"Section-header\"}, {\"bbox_2d\": [68, 955, 68, 971], \"label\": \"Page-header\"}, {\"bbox_2d\": [98, 25, 964, 983], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [98, 689, 963, 976], \"label\": \"Picture\"}, {\"bbox_2d\": [287, 25, 614, 37], \"label\": \"Section-header\"}, {\"bbox_2d\": [676, 36, 836, 44], \"label\": \"Text\"}, {\"bbox_2d\": [685, 55, 849, 64], \"label\": \"Text\"}, {\"bbox_2d\": [99, 983, 315, 989], \"label\": \"Text\"}, {\"bbox_2d\": [108, 75, 361, 83], \"label\": \"Text\"}, {\"bbox_2d\": [111, 85, 147, 93], \"label\": \"Text\"}, {\"bbox_2d\": [108, 130, 608, 138], \"label\": \"Text\"}, {\"bbox_2d\": [109, 99, 147, 108], \"label\": \"Text\"}, {\"bbox_2d\": [172, 84, 269, 93], \"label\": \"Text\"}, {\"bbox_2d\": [318, 84, 415, 93], \"label\": \"Text\"}, {\"bbox_2d\": [464, 84, 561, 93], \"label\": \"Text\"}, {\"bbox_2d\": [757, 84, 844, 93], \"label\": \"Text\"}, {\"bbox_2d\": [610, 84, 707, 93], \"label\": \"Text\"}, {\"bbox_2d\": [109, 148, 239, 156], \"label\": \"Text\"}, {\"bbox_2d\": [109, 162, 600, 170], \"label\": \"Text\"}, {\"bbox_2d\": [109, 173, 148, 182], \"label\": \"Text\"}, {\"bbox_2d\": [172, 173, 210, 182], \"label\": \"Text\"}, {\"bbox_2d\": [109, 214, 808, 222], \"label\": \"Text\"}, {\"bbox_2d\": [109, 256, 788, 264], \"label\": \"Text\"}, {\"bbox_2d\": [110, 276, 283, 285], \"label\": \"Text\"}, {\"bbox_2d\": [297, 276, 336, 285], \"label\": \"Text\"}, {\"bbox_2d\": [108, 340, 393, 348], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 221], \"label\": \"Text\"}, {\"bbox_2d\": [121, 231, 911, 357], \"label\": \"Text\"}, {\"bbox_2d\": [121, 377, 911, 557], \"label\": \"Text\"}, {\"bbox_2d\": [121, 568, 911, 776], \"label\": \"Text\"}, {\"bbox_2d\": [121, 788, 911, 940], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [443, 336, 911, 356], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 911, 220], \"label\": \"Text\"}, {\"bbox_2d\": [121, 230, 911, 330], \"label\": \"Text\"}, {\"bbox_2d\": [121, 375, 911, 557], \"label\": \"Text\"}, {\"bbox_2d\": [121, 569, 911, 776], \"label\": \"Text\"}, {\"bbox_2d\": [121, 788, 911, 940], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [444, 341, 911, 365], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 68, 951, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 951, 84], \"label\": \"Text\"}, {\"bbox_2d\": [536, 150, 562, 163], \"label\": \"Text\"}, {\"bbox_2d\": [536, 220, 951, 249], \"label\": \"Text\"}, {\"bbox_2d\": [91, 280, 506, 386], \"label\": \"Text\"}, {\"bbox_2d\": [91, 393, 506, 483], \"label\": \"Text\"}, {\"bbox_2d\": [91, 519, 506, 711], \"label\": \"Text\"}, {\"bbox_2d\": [91, 718, 506, 736], \"label\": \"Text\"}, {\"bbox_2d\": [536, 694, 951, 950], \"label\": \"Text\"}, {\"bbox_2d\": [91, 803, 506, 900], \"label\": \"Text\"}, {\"bbox_2d\": [536, 332, 951, 631], \"label\": \"Text\"}, {\"bbox_2d\": [91, 764, 411, 781], \"label\": \"Text\"}, {\"bbox_2d\": [536, 660, 951, 673], \"label\": \"Section-header\"}, {\"bbox_2d\": [210, 484, 389, 496], \"label\": \"Section-header\"}, {\"bbox_2d\": [558, 98, 951, 162], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 177, 951, 213], \"label\": \"Formula\"}, {\"bbox_2d\": [553, 267, 951, 320], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 209, 506, 267], \"label\": \"Formula\"}, {\"bbox_2d\": [123, 63, 476, 130], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 145, 506, 184], \"label\": \"Caption\"}, {\"bbox_2d\": [207, 397, 506, 420], \"label\": \"Formula\"}, {\"bbox_2d\": [253, 719, 506, 773], \"label\": \"Formula\"}, {\"bbox_2d\": [205, 789, 506, 836], \"label\": \"Formula\"}, {\"bbox_2d\": [208, 909, 506, 956], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 951, 84], \"label\": \"Text\"}, {\"bbox_2d\": [536, 151, 563, 164], \"label\": \"Text\"}, {\"bbox_2d\": [536, 220, 951, 248], \"label\": \"Text\"}, {\"bbox_2d\": [557, 100, 951, 149], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 174, 954, 218], \"label\": \"Formula\"}, {\"bbox_2d\": [550, 260, 951, 327], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 145, 506, 193], \"label\": \"Caption\"}, {\"bbox_2d\": [160, 236, 506, 278], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 213, 506, 235], \"label\": \"Text\"}, {\"bbox_2d\": [536, 332, 952, 630], \"label\": \"Text\"}, {\"bbox_2d\": [537, 662, 950, 672], \"label\": \"Section-header\"}, {\"bbox_2d\": [209, 486, 388, 497], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 427, 506, 455], \"label\": \"Text\"}, {\"bbox_2d\": [91, 279, 506, 383], \"label\": \"Text\"}, {\"bbox_2d\": [210, 393, 506, 433], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 518, 506, 720], \"label\": \"Text\"}, {\"bbox_2d\": [536, 694, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [91, 764, 407, 777], \"label\": \"Text\"}, {\"bbox_2d\": [198, 722, 506, 764], \"label\": \"Formula\"}, {\"bbox_2d\": [181, 784, 506, 826], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 827, 506, 910], \"label\": \"Text\"}, {\"bbox_2d\": [206, 909, 506, 958], \"label\": \"Formula\"}, {\"bbox_2d\": [114, 55, 478, 129], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 73, 945, 121], \"label\": \"Text\"}, {\"bbox_2d\": [91, 206, 945, 380], \"label\": \"Text\"}, {\"bbox_2d\": [91, 456, 945, 882], \"label\": \"Text\"}, {\"bbox_2d\": [508, 990, 527, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 415, 499, 434], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 163, 386, 183], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 75, 944, 120], \"label\": \"Text\"}, {\"bbox_2d\": [91, 208, 944, 379], \"label\": \"Text\"}, {\"bbox_2d\": [91, 459, 944, 880], \"label\": \"Text\"}, {\"bbox_2d\": [508, 993, 527, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 418, 499, 432], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 165, 386, 181], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [749, 68, 931, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [111, 69, 203, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 104, 413, 118], \"label\": \"Text\"}, {\"bbox_2d\": [121, 131, 903, 175], \"label\": \"Text\"}, {\"bbox_2d\": [121, 187, 903, 259], \"label\": \"Text\"}, {\"bbox_2d\": [169, 271, 722, 287], \"label\": \"Text\"}, {\"bbox_2d\": [169, 300, 579, 315], \"label\": \"Text\"}, {\"bbox_2d\": [169, 328, 690, 342], \"label\": \"Text\"}, {\"bbox_2d\": [169, 354, 747, 369], \"label\": \"Text\"}, {\"bbox_2d\": [121, 382, 909, 511], \"label\": \"Text\"}, {\"bbox_2d\": [121, 520, 912, 620], \"label\": \"Text\"}, {\"bbox_2d\": [121, 632, 912, 732], \"label\": \"Text\"}, {\"bbox_2d\": [121, 744, 912, 928], \"label\": \"Text\"}, {\"bbox_2d\": [178, 940, 908, 955], \"label\": \"Text\"}, {\"bbox_2d\": [513, 982, 529, 989], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [749, 69, 931, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [110, 69, 203, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 103, 412, 120], \"label\": \"Text\"}, {\"bbox_2d\": [121, 129, 901, 175], \"label\": \"Text\"}, {\"bbox_2d\": [122, 185, 905, 260], \"label\": \"Text\"}, {\"bbox_2d\": [126, 519, 912, 621], \"label\": \"Text\"}, {\"bbox_2d\": [127, 631, 913, 733], \"label\": \"Text\"}, {\"bbox_2d\": [129, 741, 910, 929], \"label\": \"Text\"}, {\"bbox_2d\": [179, 937, 906, 956], \"label\": \"Text\"}, {\"bbox_2d\": [513, 981, 528, 989], \"label\": \"Page-footer\"}, {\"bbox_2d\": [170, 269, 721, 287], \"label\": \"Text\"}, {\"bbox_2d\": [170, 297, 578, 315], \"label\": \"Text\"}, {\"bbox_2d\": [170, 325, 691, 343], \"label\": \"Text\"}, {\"bbox_2d\": [172, 353, 746, 371], \"label\": \"Text\"}, {\"bbox_2d\": [124, 380, 910, 510], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 43, 300, 61], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 73, 961, 182], \"label\": \"Text\"}, {\"bbox_2d\": [85, 197, 346, 213], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 228, 128, 241], \"label\": \"Text\"}, {\"bbox_2d\": [85, 1001, 104, 1012], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [85, 38, 299, 61], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 192, 343, 215], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 997, 100, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 225, 123, 241], \"label\": \"Text\"}, {\"bbox_2d\": [85, 71, 962, 182], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [122, 125, 557, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 165, 783, 178], \"label\": \"Text\"}, {\"bbox_2d\": [292, 534, 504, 544], \"label\": \"Caption\"}, {\"bbox_2d\": [906, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [307, 218, 707, 511], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 558, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 166, 784, 179], \"label\": \"Text\"}, {\"bbox_2d\": [293, 534, 506, 545], \"label\": \"Caption\"}, {\"bbox_2d\": [306, 220, 708, 508], \"label\": \"Picture\"}, {\"bbox_2d\": [904, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [272, 89, 910, 383], \"label\": \"Picture\"}, {\"bbox_2d\": [272, 412, 905, 456], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [273, 412, 905, 457], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [479, 94, 641, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [297, 352, 601, 364], \"label\": \"Text\"}, {\"bbox_2d\": [297, 140, 729, 235], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [78, 47, 465, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 48, 465, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [162, 601, 288, 611], \"label\": \"Text\"}, {\"bbox_2d\": [161, 576, 877, 882], \"label\": \"Table\"}, {\"bbox_2d\": [104, 189, 910, 456], \"label\": \"Picture\"}, {\"bbox_2d\": [516, 974, 521, 990], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [77, 49, 468, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [161, 601, 879, 883], \"label\": \"Table\"}, {\"bbox_2d\": [143, 211, 866, 443], \"label\": \"Picture\"}, {\"bbox_2d\": [515, 977, 523, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 38, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 71, 506, 303], \"label\": \"Text\"}, {\"bbox_2d\": [91, 325, 506, 506], \"label\": \"Text\"}, {\"bbox_2d\": [91, 586, 506, 719], \"label\": \"Text\"}, {\"bbox_2d\": [91, 725, 506, 810], \"label\": \"Text\"}, {\"bbox_2d\": [91, 810, 506, 921], \"label\": \"Text\"}, {\"bbox_2d\": [536, 824, 951, 946], \"label\": \"Text\"}, {\"bbox_2d\": [536, 777, 951, 806], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 541, 474, 568], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 368, 951, 453], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 495, 951, 719], \"label\": \"Text\"}, {\"bbox_2d\": [536, 78, 916, 356], \"label\": \"Picture\"}, {\"bbox_2d\": [109, 927, 506, 946], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 367, 951, 452], \"label\": \"Caption\"}, {\"bbox_2d\": [555, 73, 918, 349], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 71, 506, 294], \"label\": \"Text\"}, {\"bbox_2d\": [91, 296, 506, 504], \"label\": \"Text\"}, {\"bbox_2d\": [536, 494, 951, 716], \"label\": \"Text\"}, {\"bbox_2d\": [122, 540, 475, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [543, 752, 944, 777], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 587, 506, 719], \"label\": \"Text\"}, {\"bbox_2d\": [91, 722, 506, 810], \"label\": \"Text\"}, {\"bbox_2d\": [91, 813, 506, 901], \"label\": \"Text\"}, {\"bbox_2d\": [91, 904, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 799, 951, 947], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [44, 144, 322, 227], \"label\": \"Text\"}, {\"bbox_2d\": [44, 237, 322, 474], \"label\": \"Text\"}, {\"bbox_2d\": [343, 144, 626, 519], \"label\": \"Text\"}, {\"bbox_2d\": [44, 525, 324, 674], \"label\": \"Text\"}, {\"bbox_2d\": [343, 525, 626, 674], \"label\": \"Text\"}, {\"bbox_2d\": [646, 453, 885, 464], \"label\": \"Caption\"}, {\"bbox_2d\": [646, 171, 923, 341], \"label\": \"Text\"}, {\"bbox_2d\": [646, 144, 741, 163], \"label\": \"Section-header\"}, {\"bbox_2d\": [44, 499, 124, 519], \"label\": \"Section-header\"}, {\"bbox_2d\": [645, 359, 766, 441], \"label\": \"Picture\"}, {\"bbox_2d\": [696, 519, 904, 686], \"label\": \"Picture\"}, {\"bbox_2d\": [768, 724, 927, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [476, 724, 747, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [406, 724, 565, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [223, 724, 384, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [42, 724, 202, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [44, 878, 206, 905], \"label\": \"Caption\"}, {\"bbox_2d\": [225, 878, 368, 905], \"label\": \"Caption\"}, {\"bbox_2d\": [409, 878, 516, 905], \"label\": \"Caption\"}, {\"bbox_2d\": [589, 878, 686, 905], \"label\": \"Caption\"}, {\"bbox_2d\": [768, 878, 850, 905], \"label\": \"Caption\"}, {\"bbox_2d\": [938, 929, 953, 944], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [645, 145, 741, 164], \"label\": \"Section-header\"}, {\"bbox_2d\": [44, 501, 124, 520], \"label\": \"Section-header\"}, {\"bbox_2d\": [44, 145, 321, 228], \"label\": \"Text\"}, {\"bbox_2d\": [44, 235, 325, 474], \"label\": \"Text\"}, {\"bbox_2d\": [44, 525, 330, 676], \"label\": \"Text\"}, {\"bbox_2d\": [343, 145, 627, 519], \"label\": \"Text\"}, {\"bbox_2d\": [343, 525, 627, 676], \"label\": \"Text\"}, {\"bbox_2d\": [642, 170, 924, 342], \"label\": \"Text\"}, {\"bbox_2d\": [646, 357, 765, 439], \"label\": \"Picture\"}, {\"bbox_2d\": [642, 453, 888, 464], \"label\": \"Caption\"}, {\"bbox_2d\": [695, 522, 903, 690], \"label\": \"Picture\"}, {\"bbox_2d\": [42, 724, 203, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [223, 724, 384, 869], \"label\": \"Picture\"}, {\"bbox_2d\": [405, 724, 566, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [587, 724, 748, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [768, 724, 927, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [939, 928, 949, 944], \"label\": \"Page-footer\"}, {\"bbox_2d\": [44, 879, 205, 903], \"label\": \"Text\"}, {\"bbox_2d\": [225, 879, 371, 903], \"label\": \"Text\"}, {\"bbox_2d\": [406, 879, 516, 903], \"label\": \"Text\"}, {\"bbox_2d\": [588, 879, 682, 903], \"label\": \"Text\"}, {\"bbox_2d\": [769, 879, 855, 903], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 123, 516, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 165, 408, 181], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 431, 506, 449], \"label\": \"Section-header\"}, {\"bbox_2d\": [182, 193, 926, 252], \"label\": \"Text\"}, {\"bbox_2d\": [293, 261, 814, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 280, 932, 306], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 314, 921, 357], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 366, 901, 409], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 470, 875, 497], \"label\": \"Text\"}, {\"bbox_2d\": [293, 511, 581, 520], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 522, 932, 941], \"label\": \"Table\"}, {\"bbox_2d\": [906, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 515, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 165, 406, 181], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 432, 506, 449], \"label\": \"Section-header\"}, {\"bbox_2d\": [904, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [292, 522, 933, 944], \"label\": \"Table\"}, {\"bbox_2d\": [293, 193, 922, 253], \"label\": \"Text\"}, {\"bbox_2d\": [293, 258, 813, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 277, 931, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 312, 920, 359], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 363, 901, 410], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 470, 875, 499], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 511, 583, 521], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [547, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 45, 210, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 93, 691, 105], \"label\": \"List-item\"}, {\"bbox_2d\": [282, 441, 636, 453], \"label\": \"List-item\"}, {\"bbox_2d\": [282, 125, 893, 416], \"label\": \"Text\"}, {\"bbox_2d\": [241, 472, 883, 735], \"label\": \"Text\"}, {\"bbox_2d\": [231, 819, 730, 832], \"label\": \"Text\"}, {\"bbox_2d\": [231, 894, 870, 921], \"label\": \"Text\"}, {\"bbox_2d\": [231, 932, 897, 959], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 780, 696, 797], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 859, 405, 875], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 987, 691, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 45, 211, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [546, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [281, 93, 705, 105], \"label\": \"List-item\"}, {\"bbox_2d\": [241, 123, 893, 415], \"label\": \"Text\"}, {\"bbox_2d\": [241, 471, 884, 734], \"label\": \"Text\"}, {\"bbox_2d\": [281, 441, 651, 453], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 932, 901, 959], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 894, 875, 921], \"label\": \"Text\"}, {\"bbox_2d\": [231, 820, 731, 832], \"label\": \"Text\"}, {\"bbox_2d\": [109, 778, 696, 797], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 859, 405, 875], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 987, 693, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 86, 883, 181], \"label\": \"Title\"}, {\"bbox_2d\": [87, 191, 180, 207], \"label\": \"Text\"}, {\"bbox_2d\": [87, 219, 347, 236], \"label\": \"Text\"}, {\"bbox_2d\": [87, 247, 157, 265], \"label\": \"Text\"}, {\"bbox_2d\": [173, 275, 678, 293], \"label\": \"Text\"}, {\"bbox_2d\": [87, 303, 161, 321], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 331, 416, 349], \"label\": \"Text\"}, {\"bbox_2d\": [87, 409, 954, 445], \"label\": \"Text\"}, {\"bbox_2d\": [87, 451, 297, 469], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 636, 317, 654], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 665, 472, 682], \"label\": \"Text\"}, {\"bbox_2d\": [87, 768, 908, 804], \"label\": \"Text\"}, {\"bbox_2d\": [87, 804, 373, 822], \"label\": \"Text\"}, {\"bbox_2d\": [87, 945, 947, 987], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 721, 691, 754], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 685, 964, 718], \"label\": \"List-item\"}, {\"bbox_2d\": [158, 826, 596, 844], \"label\": \"List-item\"}, {\"bbox_2d\": [158, 849, 934, 882], \"label\": \"List-item\"}, {\"bbox_2d\": [158, 887, 971, 919], \"label\": \"List-item\"}, {\"bbox_2d\": [158, 923, 814, 941], \"label\": \"List-item\"}, {\"bbox_2d\": [485, 1003, 585, 1019], \"label\": \"Page-footer\"}, {\"bbox_2d\": [87, 476, 954, 629], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 85, 883, 180], \"label\": \"Title\"}, {\"bbox_2d\": [87, 189, 179, 208], \"label\": \"Text\"}, {\"bbox_2d\": [87, 218, 347, 237], \"label\": \"Text\"}, {\"bbox_2d\": [87, 246, 156, 266], \"label\": \"Text\"}, {\"bbox_2d\": [87, 275, 792, 294], \"label\": \"Text\"}, {\"bbox_2d\": [87, 303, 160, 322], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 382, 231, 401], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 333, 417, 349], \"label\": \"Text\"}, {\"bbox_2d\": [87, 410, 951, 444], \"label\": \"Text\"}, {\"bbox_2d\": [87, 449, 296, 469], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 478, 954, 630], \"label\": \"Text\"}, {\"bbox_2d\": [87, 635, 319, 655], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 664, 584, 683], \"label\": \"Text\"}, {\"bbox_2d\": [87, 769, 908, 803], \"label\": \"Text\"}, {\"bbox_2d\": [91, 685, 965, 719], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 720, 802, 754], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 805, 370, 824], \"label\": \"List-item\"}, {\"bbox_2d\": [143, 826, 707, 845], \"label\": \"List-item\"}, {\"bbox_2d\": [143, 847, 935, 881], \"label\": \"List-item\"}, {\"bbox_2d\": [143, 882, 974, 917], \"label\": \"List-item\"}, {\"bbox_2d\": [143, 918, 927, 938], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 945, 951, 979], \"label\": \"List-item\"}, {\"bbox_2d\": [485, 1003, 585, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 83, 909, 97], \"label\": \"Text\"}, {\"bbox_2d\": [293, 109, 638, 121], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 123, 935, 494], \"label\": \"Table\"}, {\"bbox_2d\": [448, 395, 934, 565], \"label\": \"Text\"}, {\"bbox_2d\": [448, 578, 929, 663], \"label\": \"Text\"}, {\"bbox_2d\": [448, 676, 659, 689], \"label\": \"Text\"}, {\"bbox_2d\": [293, 769, 922, 810], \"label\": \"Text\"}, {\"bbox_2d\": [448, 824, 856, 851], \"label\": \"Text\"}, {\"bbox_2d\": [448, 865, 930, 892], \"label\": \"Text\"}, {\"bbox_2d\": [448, 739, 875, 766], \"label\": \"Text\"}, {\"bbox_2d\": [448, 716, 824, 728], \"label\": \"List-item\"}, {\"bbox_2d\": [448, 695, 814, 708], \"label\": \"List-item\"}, {\"bbox_2d\": [644, 978, 937, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 903, 98], \"label\": \"Text\"}, {\"bbox_2d\": [292, 121, 932, 897], \"label\": \"Table\"}, {\"bbox_2d\": [293, 110, 634, 121], \"label\": \"Caption\"}, {\"bbox_2d\": [644, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 26, 351, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 94, 223, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 113, 471, 124], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 369, 207, 380], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 495, 380, 506], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 419, 697, 430], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 336, 588, 347], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 241, 655, 252], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 113, 945, 168], \"label\": \"Text\"}, {\"bbox_2d\": [527, 173, 945, 225], \"label\": \"Text\"}, {\"bbox_2d\": [91, 131, 508, 225], \"label\": \"Text\"}, {\"bbox_2d\": [91, 229, 508, 284], \"label\": \"Text\"}, {\"bbox_2d\": [91, 288, 508, 310], \"label\": \"Text\"}, {\"bbox_2d\": [91, 314, 508, 355], \"label\": \"Text\"}, {\"bbox_2d\": [91, 384, 508, 482], \"label\": \"Text\"}, {\"bbox_2d\": [527, 351, 945, 406], \"label\": \"Text\"}, {\"bbox_2d\": [527, 434, 945, 532], \"label\": \"Text\"}, {\"bbox_2d\": [527, 535, 945, 576], \"label\": \"Text\"}, {\"bbox_2d\": [527, 579, 945, 677], \"label\": \"Text\"}, {\"bbox_2d\": [91, 509, 508, 551], \"label\": \"Text\"}, {\"bbox_2d\": [91, 554, 508, 577], \"label\": \"Text\"}, {\"bbox_2d\": [91, 580, 508, 603], \"label\": \"Text\"}, {\"bbox_2d\": [91, 608, 508, 664], \"label\": \"Text\"}, {\"bbox_2d\": [527, 690, 795, 709], \"label\": \"Section-header\"}, {\"bbox_2d\": [546, 725, 922, 786], \"label\": \"Picture\"}, {\"bbox_2d\": [546, 847, 721, 862], \"label\": \"Section-header\"}, {\"bbox_2d\": [546, 879, 916, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [99, 690, 489, 786], \"label\": \"Picture\"}, {\"bbox_2d\": [99, 847, 494, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [203, 807, 337, 816], \"label\": \"Caption\"}, {\"bbox_2d\": [106, 806, 143, 816], \"label\": \"Caption\"}, {\"bbox_2d\": [544, 806, 775, 816], \"label\": \"Caption\"}, {\"bbox_2d\": [544, 859, 651, 867], \"label\": \"Caption\"}, {\"bbox_2d\": [316, 911, 471, 945], \"label\": \"Caption\"}, {\"bbox_2d\": [106, 911, 132, 945], \"label\": \"Caption\"}, {\"bbox_2d\": [106, 911, 245, 945], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 999, 444, 1008], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 351, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 223, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 115, 471, 125], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 338, 589, 348], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 370, 206, 380], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 495, 379, 505], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 421, 697, 431], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 129, 507, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 230, 507, 283], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 287, 507, 312], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 316, 507, 355], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 384, 507, 480], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 510, 507, 548], \"label\": \"List-item\"}, {\"bbox_2d\": [118, 553, 507, 577], \"label\": \"List-item\"}, {\"bbox_2d\": [118, 582, 507, 606], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 610, 507, 664], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 115, 944, 168], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 172, 944, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 241, 655, 251], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 256, 944, 323], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 353, 944, 406], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 435, 944, 531], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 536, 944, 575], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 579, 944, 675], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 1000, 444, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [88, 678, 510, 825], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 678, 948, 827], \"label\": \"Picture\"}, {\"bbox_2d\": [87, 835, 512, 962], \"label\": \"Picture\"}, {\"bbox_2d\": [522, 834, 949, 963], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [124, 101, 655, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 118, 510, 162], \"label\": \"Text\"}, {\"bbox_2d\": [564, 118, 935, 147], \"label\": \"Text\"}, {\"bbox_2d\": [121, 254, 680, 272], \"label\": \"Text\"}, {\"bbox_2d\": [121, 342, 538, 354], \"label\": \"Text\"}, {\"bbox_2d\": [124, 322, 198, 336], \"label\": \"Section-header\"}, {\"bbox_2d\": [862, 1002, 975, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [119, 183, 964, 251], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [125, 102, 654, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 324, 198, 336], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 120, 516, 162], \"label\": \"Text\"}, {\"bbox_2d\": [566, 120, 941, 146], \"label\": \"Text\"}, {\"bbox_2d\": [120, 256, 681, 275], \"label\": \"Text\"}, {\"bbox_2d\": [120, 343, 539, 354], \"label\": \"Text\"}, {\"bbox_2d\": [119, 190, 962, 251], \"label\": \"Table\"}, {\"bbox_2d\": [883, 178, 963, 186], \"label\": \"Text\"}, {\"bbox_2d\": [863, 1001, 977, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [62, 44, 432, 62], \"label\": \"Section-header\"}, {\"bbox_2d\": [20, 401, 36, 637], \"label\": \"Page-header\"}, {\"bbox_2d\": [507, 997, 529, 1010], \"label\": \"Page-footer\"}, {\"bbox_2d\": [526, 131, 622, 146], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 164, 965, 195], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 86, 318, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 208, 845, 248], \"label\": \"Text\"}, {\"bbox_2d\": [526, 345, 730, 376], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 389, 763, 417], \"label\": \"Text\"}, {\"bbox_2d\": [526, 433, 782, 494], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 557, 730, 572], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 636, 688, 651], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 695, 762, 721], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 737, 727, 752], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 767, 638, 780], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 795, 642, 809], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 772, 280, 807], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 819, 276, 834], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 629, 321, 645], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 458, 206, 490], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 345, 267, 376], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 313, 267, 328], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 131, 214, 146], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 158, 511, 283], \"label\": \"Picture\"}, {\"bbox_2d\": [302, 328, 457, 341], \"label\": \"Section-header\"}, {\"bbox_2d\": [302, 360, 352, 412], \"label\": \"Picture\"}, {\"bbox_2d\": [302, 428, 409, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [302, 456, 457, 471], \"label\": \"Section-header\"}, {\"bbox_2d\": [302, 486, 352, 536], \"label\": \"Picture\"}, {\"bbox_2d\": [302, 551, 352, 612], \"label\": \"Picture\"}, {\"bbox_2d\": [302, 665, 457, 741], \"label\": \"Picture\"}, {\"bbox_2d\": [302, 774, 474, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 792, 642, 809], \"label\": \"Text\"}, {\"bbox_2d\": [526, 767, 638, 780], \"label\": \"Text\"}, {\"bbox_2d\": [526, 636, 688, 651], \"label\": \"Text\"}, {\"bbox_2d\": [526, 557, 730, 572], \"label\": \"Text\"}, {\"bbox_2d\": [526, 616, 837, 635], \"label\": \"Text\"}, {\"bbox_2d\": [526, 433, 782, 494], \"label\": \"Text\"}, {\"bbox_2d\": [526, 389, 763, 417], \"label\": \"Text\"}, {\"bbox_2d\": [798, 328, 950, 341], \"label\": \"Section-header\"}, {\"bbox_2d\": [798, 378, 950, 409], \"label\": \"Section-header\"}, {\"bbox_2d\": [798, 456, 958, 541], \"label\": \"Picture\"}, {\"bbox_2d\": [798, 567, 958, 606], \"label\": \"Picture\"}, {\"bbox_2d\": [798, 678, 958, 710], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 433, 782, 494], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [60, 41, 431, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [529, 129, 622, 148], \"label\": \"Section-header\"}, {\"bbox_2d\": [63, 82, 259, 101], \"label\": \"Section-header\"}, {\"bbox_2d\": [63, 98, 318, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [63, 129, 213, 148], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 993, 527, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [19, 402, 37, 633], \"label\": \"Page-header\"}, {\"bbox_2d\": [529, 160, 970, 195], \"label\": \"Section-header\"}, {\"bbox_2d\": [529, 206, 844, 250], \"label\": \"Text\"}, {\"bbox_2d\": [62, 159, 512, 281], \"label\": \"Picture\"}, {\"bbox_2d\": [56, 304, 958, 866], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [55, 49, 516, 88], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 110, 176, 125], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 391, 246, 406], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 408, 132, 423], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 128, 163, 387], \"label\": \"Text\"}, {\"bbox_2d\": [292, 110, 495, 301], \"label\": \"Text\"}, {\"bbox_2d\": [532, 110, 743, 332], \"label\": \"Text\"}, {\"bbox_2d\": [532, 334, 723, 377], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 379, 567, 388], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 389, 567, 398], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 399, 567, 408], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 410, 567, 419], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 420, 567, 430], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 431, 567, 440], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 442, 567, 452], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 453, 567, 463], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 464, 567, 473], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 474, 567, 483], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 484, 567, 494], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 495, 567, 504], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 505, 567, 514], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 515, 567, 525], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 526, 567, 535], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 536, 567, 545], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 546, 567, 556], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 557, 567, 566], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 567, 567, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 577, 567, 587], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 588, 567, 597], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 598, 567, 607], \"label\": \"Section-header\"}, {\"bbox_2d\": [296, 612, 946, 669], \"label\": \"Text\"}, {\"bbox_2d\": [55, 662, 221, 677], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 680, 111, 690], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 691, 261, 895], \"label\": \"Text\"}, {\"bbox_2d\": [55, 901, 211, 916], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 919, 261, 976], \"label\": \"Text\"}, {\"bbox_2d\": [292, 679, 986, 980], \"label\": \"Picture\"}, {\"bbox_2d\": [814, 1002, 997, 1017], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [55, 57, 515, 86], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 114, 181, 124], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 309, 488, 319], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 338, 723, 349], \"label\": \"Section-header\"}, {\"bbox_2d\": [771, 338, 951, 349], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 397, 251, 407], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 548, 491, 559], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 548, 714, 559], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 665, 224, 676], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 904, 215, 915], \"label\": \"Section-header\"}, {\"bbox_2d\": [865, 1003, 1005, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [771, 114, 975, 329], \"label\": \"Text\"}, {\"bbox_2d\": [532, 114, 743, 329], \"label\": \"Text\"}, {\"bbox_2d\": [293, 114, 506, 300], \"label\": \"Text\"}, {\"bbox_2d\": [55, 128, 264, 387], \"label\": \"Text\"}, {\"bbox_2d\": [293, 324, 503, 539], \"label\": \"Text\"}, {\"bbox_2d\": [532, 353, 737, 539], \"label\": \"Text\"}, {\"bbox_2d\": [771, 353, 972, 597], \"label\": \"Text\"}, {\"bbox_2d\": [532, 563, 695, 588], \"label\": \"Text\"}, {\"bbox_2d\": [293, 563, 463, 588], \"label\": \"Text\"}, {\"bbox_2d\": [55, 411, 266, 656], \"label\": \"Text\"}, {\"bbox_2d\": [296, 615, 948, 668], \"label\": \"Text\"}, {\"bbox_2d\": [55, 680, 259, 895], \"label\": \"Text\"}, {\"bbox_2d\": [55, 919, 266, 973], \"label\": \"Text\"}, {\"bbox_2d\": [294, 679, 987, 980], \"label\": \"Picture\"}, {\"bbox_2d\": [812, 1001, 861, 1012], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 47, 691, 62], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 140, 946, 194], \"label\": \"Text\"}, {\"bbox_2d\": [91, 207, 946, 251], \"label\": \"Text\"}, {\"bbox_2d\": [91, 306, 946, 346], \"label\": \"Text\"}, {\"bbox_2d\": [91, 359, 946, 403], \"label\": \"Text\"}, {\"bbox_2d\": [91, 412, 946, 511], \"label\": \"Text\"}, {\"bbox_2d\": [91, 569, 414, 579], \"label\": \"Text\"}, {\"bbox_2d\": [91, 819, 508, 827], \"label\": \"Text\"}, {\"bbox_2d\": [512, 988, 526, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [70, 590, 965, 812], \"label\": \"Table\"}, {\"bbox_2d\": [60, 282, 371, 294], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 544, 420, 556], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 47, 809, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 141, 944, 195], \"label\": \"Text\"}, {\"bbox_2d\": [91, 208, 944, 248], \"label\": \"Text\"}, {\"bbox_2d\": [91, 307, 944, 347], \"label\": \"Text\"}, {\"bbox_2d\": [91, 359, 944, 399], \"label\": \"Text\"}, {\"bbox_2d\": [91, 412, 944, 510], \"label\": \"Text\"}, {\"bbox_2d\": [91, 820, 510, 828], \"label\": \"Text\"}, {\"bbox_2d\": [510, 988, 527, 1005], \"label\": \"Page-footer\"}, {\"bbox_2d\": [62, 545, 422, 556], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 283, 371, 294], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 592, 309, 599], \"label\": \"Text\"}, {\"bbox_2d\": [91, 602, 944, 811], \"label\": \"Table\"}, {\"bbox_2d\": [91, 568, 415, 580], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 71, 506, 119], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 157], \"label\": \"Text\"}, {\"bbox_2d\": [91, 176, 506, 223], \"label\": \"Text\"}, {\"bbox_2d\": [91, 223, 506, 434], \"label\": \"Text\"}, {\"bbox_2d\": [536, 133, 951, 267], \"label\": \"Text\"}, {\"bbox_2d\": [536, 270, 951, 446], \"label\": \"Text\"}, {\"bbox_2d\": [536, 451, 951, 691], \"label\": \"Text\"}, {\"bbox_2d\": [536, 691, 951, 841], \"label\": \"Text\"}, {\"bbox_2d\": [91, 490, 506, 590], \"label\": \"Text\"}, {\"bbox_2d\": [91, 594, 506, 743], \"label\": \"Text\"}, {\"bbox_2d\": [91, 782, 506, 809], \"label\": \"Text\"}, {\"bbox_2d\": [536, 874, 951, 950], \"label\": \"Text\"}, {\"bbox_2d\": [91, 874, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [209, 842, 387, 853], \"label\": \"Section-header\"}, {\"bbox_2d\": [211, 129, 506, 163], \"label\": \"Formula\"}, {\"bbox_2d\": [132, 449, 506, 473], \"label\": \"Formula\"}, {\"bbox_2d\": [184, 755, 506, 770], \"label\": \"Formula\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [536, 71, 951, 129], \"label\": \"Text\"}, {\"bbox_2d\": [536, 132, 951, 265], \"label\": \"Text\"}, {\"bbox_2d\": [158, 129, 506, 172], \"label\": \"Formula\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 221, 506, 432], \"label\": \"Text\"}, {\"bbox_2d\": [536, 269, 951, 446], \"label\": \"Text\"}, {\"bbox_2d\": [91, 489, 506, 592], \"label\": \"Text\"}, {\"bbox_2d\": [536, 450, 951, 688], \"label\": \"Text\"}, {\"bbox_2d\": [91, 594, 506, 742], \"label\": \"Text\"}, {\"bbox_2d\": [536, 691, 951, 839], \"label\": \"Text\"}, {\"bbox_2d\": [91, 782, 506, 819], \"label\": \"Text\"}, {\"bbox_2d\": [91, 874, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 874, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [211, 841, 386, 853], \"label\": \"Section-header\"}, {\"bbox_2d\": [193, 752, 506, 768], \"label\": \"Formula\"}, {\"bbox_2d\": [131, 445, 506, 484], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 681, 911, 752], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [122, 91, 915, 648], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [124, 90, 917, 653], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 683, 910, 752], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [680, 17, 915, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [952, 14, 998, 27], \"label\": \"Page-header\"}, {\"bbox_2d\": [863, 998, 992, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [195, 957, 672, 967], \"label\": \"List-item\"}, {\"bbox_2d\": [195, 967, 565, 977], \"label\": \"List-item\"}, {\"bbox_2d\": [160, 945, 832, 956], \"label\": \"Text\"}, {\"bbox_2d\": [158, 98, 964, 493], \"label\": \"Table\"}, {\"bbox_2d\": [158, 507, 965, 937], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [681, 18, 997, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [159, 102, 964, 493], \"label\": \"Table\"}, {\"bbox_2d\": [160, 510, 967, 939], \"label\": \"Table\"}, {\"bbox_2d\": [161, 947, 832, 956], \"label\": \"Text\"}, {\"bbox_2d\": [195, 959, 783, 979], \"label\": \"Text\"}, {\"bbox_2d\": [863, 1001, 992, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [112, 101, 463, 140], \"label\": \"Text\"}, {\"bbox_2d\": [112, 148, 476, 217], \"label\": \"Text\"}, {\"bbox_2d\": [112, 224, 477, 309], \"label\": \"Text\"}, {\"bbox_2d\": [112, 341, 469, 380], \"label\": \"Text\"}, {\"bbox_2d\": [112, 402, 468, 442], \"label\": \"Text\"}, {\"bbox_2d\": [112, 464, 466, 532], \"label\": \"Text\"}, {\"bbox_2d\": [112, 538, 476, 590], \"label\": \"Text\"}, {\"bbox_2d\": [112, 597, 468, 652], \"label\": \"Text\"}, {\"bbox_2d\": [112, 659, 477, 763], \"label\": \"Text\"}, {\"bbox_2d\": [112, 792, 476, 847], \"label\": \"Text\"}, {\"bbox_2d\": [112, 865, 468, 968], \"label\": \"Text\"}, {\"bbox_2d\": [546, 121, 912, 170], \"label\": \"Text\"}, {\"bbox_2d\": [546, 192, 908, 231], \"label\": \"Text\"}, {\"bbox_2d\": [546, 253, 910, 307], \"label\": \"Text\"}, {\"bbox_2d\": [546, 315, 911, 415], \"label\": \"Text\"}, {\"bbox_2d\": [546, 417, 910, 473], \"label\": \"Text\"}, {\"bbox_2d\": [546, 507, 892, 531], \"label\": \"Text\"}, {\"bbox_2d\": [546, 543, 901, 589], \"label\": \"Text\"}, {\"bbox_2d\": [546, 601, 911, 652], \"label\": \"Text\"}, {\"bbox_2d\": [546, 675, 893, 743], \"label\": \"Text\"}, {\"bbox_2d\": [546, 745, 899, 849], \"label\": \"Text\"}, {\"bbox_2d\": [546, 852, 906, 966], \"label\": \"Text\"}, {\"bbox_2d\": [112, 852, 308, 863], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 778, 351, 788], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 656, 653, 667], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 589, 707, 599], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 538, 814, 548], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 478, 650, 488], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 307, 735, 317], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 237, 723, 247], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 178, 768, 188], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 101, 770, 111], \"label\": \"List-item\"}, {\"bbox_2d\": [112, 328, 285, 338], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 388, 285, 398], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 448, 352, 458], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 411, 466, 442], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 301, 285, 311], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 19, 242, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [37, 17, 83, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [60, 999, 169, 1015], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [130, 20, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [38, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [113, 101, 468, 142], \"label\": \"Text\"}, {\"bbox_2d\": [113, 146, 480, 217], \"label\": \"Text\"}, {\"bbox_2d\": [113, 222, 482, 322], \"label\": \"Text\"}, {\"bbox_2d\": [113, 537, 481, 592], \"label\": \"Text\"}, {\"bbox_2d\": [113, 597, 471, 652], \"label\": \"Text\"}, {\"bbox_2d\": [113, 657, 481, 772], \"label\": \"Text\"}, {\"bbox_2d\": [95, 1003, 170, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [58, 1001, 86, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [566, 852, 917, 968], \"label\": \"Text\"}, {\"bbox_2d\": [566, 747, 912, 848], \"label\": \"Text\"}, {\"bbox_2d\": [86, 327, 470, 382], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 387, 470, 442], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 447, 473, 532], \"label\": \"List-item\"}, {\"bbox_2d\": [520, 101, 918, 172], \"label\": \"List-item\"}, {\"bbox_2d\": [520, 177, 912, 232], \"label\": \"List-item\"}, {\"bbox_2d\": [520, 237, 916, 307], \"label\": \"List-item\"}, {\"bbox_2d\": [520, 312, 916, 412], \"label\": \"List-item\"}, {\"bbox_2d\": [520, 477, 651, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 492, 897, 532], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 537, 906, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 597, 912, 652], \"label\": \"List-item\"}, {\"bbox_2d\": [520, 657, 654, 667], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 672, 898, 742], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 777, 482, 848], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 852, 474, 968], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 417, 915, 472], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 108, 885, 249], \"label\": \"Text\"}, {\"bbox_2d\": [147, 287, 885, 453], \"label\": \"Text\"}, {\"bbox_2d\": [147, 491, 888, 657], \"label\": \"Text\"}, {\"bbox_2d\": [147, 694, 887, 860], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [147, 109, 887, 249], \"label\": \"Text\"}, {\"bbox_2d\": [148, 287, 889, 452], \"label\": \"Text\"}, {\"bbox_2d\": [148, 490, 887, 656], \"label\": \"Text\"}, {\"bbox_2d\": [148, 694, 888, 859], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 36, 976, 59], \"label\": \"Text\"}, {\"bbox_2d\": [25, 63, 322, 74], \"label\": \"Section-header\"}, {\"bbox_2d\": [42, 81, 181, 91], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 104, 1007, 147], \"label\": \"Text\"}, {\"bbox_2d\": [42, 155, 215, 166], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 172, 989, 205], \"label\": \"Text\"}, {\"bbox_2d\": [42, 212, 120, 222], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 256, 327, 267], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 275, 743, 285], \"label\": \"Text\"}, {\"bbox_2d\": [25, 230, 956, 250], \"label\": \"Text\"}, {\"bbox_2d\": [25, 429, 937, 451], \"label\": \"Text\"}, {\"bbox_2d\": [25, 301, 1008, 417], \"label\": \"Table\"}, {\"bbox_2d\": [508, 464, 525, 474], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 36, 976, 58], \"label\": \"Text\"}, {\"bbox_2d\": [25, 104, 1007, 148], \"label\": \"Text\"}, {\"bbox_2d\": [25, 172, 1003, 205], \"label\": \"Text\"}, {\"bbox_2d\": [25, 228, 956, 250], \"label\": \"Text\"}, {\"bbox_2d\": [55, 274, 742, 285], \"label\": \"Text\"}, {\"bbox_2d\": [25, 429, 938, 451], \"label\": \"Text\"}, {\"bbox_2d\": [25, 64, 322, 75], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 81, 180, 92], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 155, 215, 165], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 211, 121, 222], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 257, 327, 268], \"label\": \"Section-header\"}, {\"bbox_2d\": [26, 300, 1011, 417], \"label\": \"Table\"}, {\"bbox_2d\": [508, 464, 525, 475], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 27, 388, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 94, 508, 148], \"label\": \"Text\"}, {\"bbox_2d\": [528, 94, 943, 135], \"label\": \"Text\"}, {\"bbox_2d\": [528, 160, 943, 214], \"label\": \"Text\"}, {\"bbox_2d\": [91, 190, 508, 244], \"label\": \"Text\"}, {\"bbox_2d\": [91, 255, 508, 309], \"label\": \"Text\"}, {\"bbox_2d\": [91, 335, 508, 404], \"label\": \"Text\"}, {\"bbox_2d\": [91, 428, 508, 455], \"label\": \"Text\"}, {\"bbox_2d\": [528, 237, 943, 336], \"label\": \"Text\"}, {\"bbox_2d\": [528, 509, 943, 536], \"label\": \"Text\"}, {\"bbox_2d\": [528, 579, 943, 638], \"label\": \"Text\"}, {\"bbox_2d\": [528, 647, 943, 730], \"label\": \"Text\"}, {\"bbox_2d\": [528, 756, 943, 795], \"label\": \"Text\"}, {\"bbox_2d\": [528, 820, 943, 847], \"label\": \"Text\"}, {\"bbox_2d\": [91, 814, 508, 941], \"label\": \"Text\"}, {\"bbox_2d\": [91, 576, 508, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 656, 508, 696], \"label\": \"Text\"}, {\"bbox_2d\": [91, 707, 508, 790], \"label\": \"Text\"}, {\"bbox_2d\": [528, 552, 827, 563], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 567, 697, 578], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 641, 271, 652], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 414, 207, 425], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 321, 210, 331], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 224, 619, 235], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 145, 807, 156], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 170, 251, 180], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 176, 149, 186], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 807, 622, 817], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 801, 345, 811], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 999, 451, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [527, 851, 947, 929], \"label\": \"Table\"}, {\"bbox_2d\": [91, 458, 508, 563], \"label\": \"Table\"}, {\"bbox_2d\": [528, 343, 948, 501], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 389, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 507, 148], \"label\": \"Text\"}, {\"bbox_2d\": [91, 190, 507, 244], \"label\": \"Text\"}, {\"bbox_2d\": [91, 176, 150, 186], \"label\": \"Text\"}, {\"bbox_2d\": [91, 256, 507, 309], \"label\": \"Text\"}, {\"bbox_2d\": [91, 321, 211, 331], \"label\": \"Text\"}, {\"bbox_2d\": [91, 336, 507, 403], \"label\": \"Text\"}, {\"bbox_2d\": [91, 415, 207, 425], \"label\": \"Text\"}, {\"bbox_2d\": [91, 430, 507, 454], \"label\": \"Text\"}, {\"bbox_2d\": [91, 576, 507, 630], \"label\": \"Text\"}, {\"bbox_2d\": [91, 656, 507, 695], \"label\": \"Text\"}, {\"bbox_2d\": [91, 707, 507, 789], \"label\": \"Text\"}, {\"bbox_2d\": [91, 816, 507, 941], \"label\": \"Text\"}, {\"bbox_2d\": [528, 821, 944, 845], \"label\": \"Text\"}, {\"bbox_2d\": [528, 756, 944, 794], \"label\": \"Text\"}, {\"bbox_2d\": [528, 647, 944, 729], \"label\": \"Text\"}, {\"bbox_2d\": [528, 582, 944, 635], \"label\": \"Text\"}, {\"bbox_2d\": [528, 511, 944, 535], \"label\": \"Text\"}, {\"bbox_2d\": [528, 240, 944, 337], \"label\": \"Text\"}, {\"bbox_2d\": [528, 160, 944, 214], \"label\": \"Text\"}, {\"bbox_2d\": [528, 95, 944, 134], \"label\": \"Text\"}, {\"bbox_2d\": [528, 146, 808, 156], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 226, 620, 236], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 553, 828, 563], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 567, 702, 577], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 741, 862, 751], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 807, 622, 817], \"label\": \"Text\"}, {\"bbox_2d\": [91, 801, 345, 811], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 642, 271, 652], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 160, 251, 170], \"label\": \"Section-header\"}, {\"bbox_2d\": [89, 457, 512, 563], \"label\": \"Table\"}, {\"bbox_2d\": [91, 1000, 451, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [525, 850, 953, 930], \"label\": \"Table\"}, {\"bbox_2d\": [524, 339, 947, 496], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [306, 54, 326, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [464, 54, 596, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [737, 54, 757, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [553, 87, 937, 185], \"label\": \"Text\"}, {\"bbox_2d\": [123, 86, 506, 363], \"label\": \"Text\"}, {\"bbox_2d\": [123, 361, 506, 477], \"label\": \"Text\"}, {\"bbox_2d\": [553, 186, 937, 606], \"label\": \"Text\"}, {\"bbox_2d\": [123, 476, 506, 564], \"label\": \"Text\"}, {\"bbox_2d\": [553, 606, 937, 636], \"label\": \"Text\"}, {\"bbox_2d\": [553, 634, 937, 899], \"label\": \"Text\"}, {\"bbox_2d\": [123, 565, 506, 926], \"label\": \"Text\"}, {\"bbox_2d\": [553, 897, 937, 926], \"label\": \"Text\"}, {\"bbox_2d\": [523, 988, 532, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [305, 55, 328, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [463, 54, 595, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [736, 55, 759, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 84, 523, 359], \"label\": \"Text\"}, {\"bbox_2d\": [123, 360, 521, 475], \"label\": \"Text\"}, {\"bbox_2d\": [123, 476, 513, 562], \"label\": \"Text\"}, {\"bbox_2d\": [123, 563, 522, 852], \"label\": \"Text\"}, {\"bbox_2d\": [123, 854, 516, 925], \"label\": \"Text\"}, {\"bbox_2d\": [554, 868, 946, 925], \"label\": \"Text\"}, {\"bbox_2d\": [554, 84, 964, 184], \"label\": \"Text\"}, {\"bbox_2d\": [553, 186, 955, 635], \"label\": \"Text\"}, {\"bbox_2d\": [554, 636, 953, 867], \"label\": \"Text\"}, {\"bbox_2d\": [525, 989, 533, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [337, 478, 754, 565], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [341, 476, 720, 557], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [122, 99, 595, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 132, 476, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 164, 774, 176], \"label\": \"Text\"}, {\"bbox_2d\": [122, 195, 914, 776], \"label\": \"Table\"}, {\"bbox_2d\": [496, 955, 543, 970], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 599, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 129, 479, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [493, 953, 546, 971], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 161, 778, 177], \"label\": \"Text\"}, {\"bbox_2d\": [151, 187, 916, 772], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [585, 68, 841, 82], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 141, 838, 195], \"label\": \"Text\"}, {\"bbox_2d\": [152, 225, 838, 344], \"label\": \"Text\"}, {\"bbox_2d\": [152, 364, 838, 503], \"label\": \"Text\"}, {\"bbox_2d\": [152, 533, 838, 587], \"label\": \"Text\"}, {\"bbox_2d\": [152, 617, 838, 651], \"label\": \"Text\"}, {\"bbox_2d\": [152, 701, 838, 777], \"label\": \"Text\"}, {\"bbox_2d\": [152, 807, 838, 841], \"label\": \"Text\"}, {\"bbox_2d\": [152, 871, 838, 926], \"label\": \"Text\"}, {\"bbox_2d\": [903, 947, 924, 960], \"label\": \"Page-footer\"}, {\"bbox_2d\": [893, 961, 998, 1018], \"label\": \"Picture\"}, {\"bbox_2d\": [149, 682, 936, 693], \"label\": \"Section-header\"}, {\"bbox_2d\": [509, 364, 939, 375], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [586, 69, 844, 83], \"label\": \"Page-header\"}, {\"bbox_2d\": [151, 140, 838, 195], \"label\": \"Text\"}, {\"bbox_2d\": [151, 225, 838, 343], \"label\": \"Text\"}, {\"bbox_2d\": [151, 363, 942, 502], \"label\": \"Text\"}, {\"bbox_2d\": [151, 533, 838, 587], \"label\": \"Text\"}, {\"bbox_2d\": [151, 617, 838, 650], \"label\": \"Text\"}, {\"bbox_2d\": [151, 702, 838, 777], \"label\": \"Text\"}, {\"bbox_2d\": [151, 808, 838, 841], \"label\": \"Text\"}, {\"bbox_2d\": [151, 871, 838, 925], \"label\": \"Text\"}, {\"bbox_2d\": [906, 948, 927, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [885, 960, 998, 1020], \"label\": \"Picture\"}, {\"bbox_2d\": [177, 683, 940, 693], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [229, 88, 460, 100], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 109, 492, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [218, 130, 878, 157], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 164, 878, 205], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 212, 760, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 233, 878, 260], \"label\": \"List-item\"}, {\"bbox_2d\": [222, 268, 878, 295], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 376, 642, 389], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 396, 463, 409], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 416, 436, 429], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 436, 732, 448], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 455, 805, 468], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 475, 321, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 495, 635, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [222, 515, 629, 527], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 609, 438, 622], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 629, 878, 671], \"label\": \"List-item\"}, {\"bbox_2d\": [224, 678, 878, 721], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 728, 878, 800], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 807, 878, 850], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 857, 878, 944], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 548, 538, 562], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 574, 878, 603], \"label\": \"Text\"}, {\"bbox_2d\": [155, 313, 452, 330], \"label\": \"Section-header\"}, {\"bbox_2d\": [186, 164, 878, 205], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 212, 760, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 233, 878, 260], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 268, 629, 295], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 313, 563, 342], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 548, 538, 562], \"label\": \"Section-header\"}, {\"bbox_2d\": [186, 376, 642, 389], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 396, 463, 409], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 416, 436, 429], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 436, 732, 448], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 455, 805, 468], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 475, 321, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 495, 635, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [222, 515, 629, 527], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 609, 878, 622], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 728, 878, 800], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 807, 878, 850], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 678, 878, 721], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 728, 878, 800], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 807, 878, 850], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 857, 878, 944], \"label\": \"List-item\"}, {\"bbox_2d\": [507, 979, 527, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [228, 91, 465, 103], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 111, 497, 123], \"label\": \"List-item\"}, {\"bbox_2d\": [217, 131, 884, 157], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 165, 884, 207], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 215, 765, 226], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 234, 884, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 269, 884, 296], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 378, 647, 389], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 397, 468, 409], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 417, 440, 429], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 437, 736, 449], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 457, 810, 469], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 477, 326, 488], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 496, 640, 508], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 516, 635, 528], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 610, 441, 622], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 630, 884, 671], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 679, 884, 721], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 729, 884, 800], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 808, 884, 850], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 858, 884, 944], \"label\": \"List-item\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [156, 551, 543, 563], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 319, 457, 330], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 343, 884, 370], \"label\": \"Text\"}, {\"bbox_2d\": [156, 575, 884, 602], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}, {\"bbox_2d\": [87, 76, 991, 958], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 77, 990, 958], \"label\": \"Table\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 25, 351, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 94, 507, 120], \"label\": \"Text\"}, {\"bbox_2d\": [91, 130, 507, 258], \"label\": \"Text\"}, {\"bbox_2d\": [91, 268, 507, 336], \"label\": \"Text\"}, {\"bbox_2d\": [91, 348, 507, 461], \"label\": \"Text\"}, {\"bbox_2d\": [91, 475, 160, 489], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 499, 444, 525], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 528, 507, 568], \"label\": \"Text\"}, {\"bbox_2d\": [91, 571, 507, 614], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 615, 507, 685], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 686, 507, 743], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 93, 945, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 201, 945, 227], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 231, 945, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 275, 945, 331], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 331, 945, 417], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 425, 945, 468], \"label\": \"Text\"}, {\"bbox_2d\": [526, 478, 945, 547], \"label\": \"Text\"}, {\"bbox_2d\": [526, 559, 846, 571], \"label\": \"Text\"}, {\"bbox_2d\": [526, 571, 945, 656], \"label\": \"Text\"}, {\"bbox_2d\": [526, 657, 945, 685], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 686, 923, 700], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 700, 834, 713], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 729, 758, 741], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 757, 472, 798], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 742, 945, 799], \"label\": \"Text\"}, {\"bbox_2d\": [526, 800, 945, 842], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 843, 945, 871], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 872, 945, 898], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 900, 945, 942], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 844, 507, 955], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 800, 507, 842], \"label\": \"Text\"}, {\"bbox_2d\": [91, 999, 445, 1008], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 351, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 507, 119], \"label\": \"Text\"}, {\"bbox_2d\": [91, 132, 507, 257], \"label\": \"Text\"}, {\"bbox_2d\": [91, 269, 507, 337], \"label\": \"Text\"}, {\"bbox_2d\": [91, 349, 507, 460], \"label\": \"Text\"}, {\"bbox_2d\": [91, 529, 507, 568], \"label\": \"Text\"}, {\"bbox_2d\": [91, 801, 507, 840], \"label\": \"Text\"}, {\"bbox_2d\": [91, 1000, 444, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [528, 744, 944, 797], \"label\": \"Text\"}, {\"bbox_2d\": [528, 558, 870, 568], \"label\": \"Text\"}, {\"bbox_2d\": [528, 478, 944, 546], \"label\": \"Text\"}, {\"bbox_2d\": [528, 427, 944, 466], \"label\": \"Text\"}, {\"bbox_2d\": [528, 204, 944, 228], \"label\": \"Text\"}, {\"bbox_2d\": [528, 95, 944, 191], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 232, 944, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 275, 944, 329], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 333, 944, 415], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 572, 944, 654], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 659, 944, 683], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 688, 835, 698], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 702, 835, 712], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 801, 944, 840], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 845, 944, 869], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 873, 944, 898], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 902, 944, 941], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 845, 507, 955], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 688, 507, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 616, 507, 683], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 572, 507, 611], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 477, 159, 489], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 500, 444, 525], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 758, 472, 797], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 729, 758, 739], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [86, 28, 210, 44], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 47, 438, 62], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 153, 427, 167], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 479, 182, 494], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 555, 202, 569], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 65, 627, 77], \"label\": \"Text\"}, {\"bbox_2d\": [86, 95, 950, 140], \"label\": \"Text\"}, {\"bbox_2d\": [86, 170, 964, 294], \"label\": \"Text\"}, {\"bbox_2d\": [86, 309, 954, 465], \"label\": \"Text\"}, {\"bbox_2d\": [86, 494, 954, 540], \"label\": \"Text\"}, {\"bbox_2d\": [86, 572, 958, 711], \"label\": \"Text\"}, {\"bbox_2d\": [801, 996, 965, 1005], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [85, 22, 439, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 62, 628, 78], \"label\": \"Text\"}, {\"bbox_2d\": [85, 92, 954, 139], \"label\": \"Text\"}, {\"bbox_2d\": [85, 167, 969, 294], \"label\": \"Text\"}, {\"bbox_2d\": [85, 149, 426, 168], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 306, 958, 464], \"label\": \"Text\"}, {\"bbox_2d\": [85, 493, 958, 540], \"label\": \"Text\"}, {\"bbox_2d\": [85, 568, 962, 711], \"label\": \"Text\"}, {\"bbox_2d\": [802, 996, 965, 1006], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 475, 182, 493], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 550, 202, 569], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [464, 53, 596, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [300, 224, 654, 777], \"label\": \"Picture\"}, {\"bbox_2d\": [678, 770, 772, 800], \"label\": \"Caption\"}, {\"bbox_2d\": [526, 989, 533, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [464, 53, 596, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [525, 987, 534, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [239, 187, 822, 801], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [781, 62, 987, 94], \"label\": \"Page-header\"}, {\"bbox_2d\": [569, 106, 987, 627], \"label\": \"Text\"}, {\"bbox_2d\": [117, 109, 533, 479], \"label\": \"Text\"}, {\"bbox_2d\": [117, 482, 533, 609], \"label\": \"Text\"}, {\"bbox_2d\": [117, 653, 136, 668], \"label\": \"Text\"}, {\"bbox_2d\": [117, 703, 170, 718], \"label\": \"Text\"}, {\"bbox_2d\": [117, 855, 533, 929], \"label\": \"Text\"}, {\"bbox_2d\": [538, 630, 987, 977], \"label\": \"Text\"}, {\"bbox_2d\": [157, 733, 533, 844], \"label\": \"Formula\"}, {\"bbox_2d\": [274, 676, 385, 692], \"label\": \"Formula\"}, {\"bbox_2d\": [274, 626, 353, 641], \"label\": \"Formula\"}, {\"bbox_2d\": [249, 949, 401, 972], \"label\": \"Formula\"}, {\"bbox_2d\": [538, 988, 565, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [782, 63, 1008, 94], \"label\": \"Page-header\"}, {\"bbox_2d\": [118, 107, 542, 480], \"label\": \"Text\"}, {\"bbox_2d\": [572, 106, 1003, 628], \"label\": \"Text\"}, {\"bbox_2d\": [118, 481, 543, 611], \"label\": \"Text\"}, {\"bbox_2d\": [571, 629, 1004, 977], \"label\": \"Text\"}, {\"bbox_2d\": [116, 855, 537, 929], \"label\": \"Text\"}, {\"bbox_2d\": [117, 702, 179, 720], \"label\": \"Text\"}, {\"bbox_2d\": [117, 652, 142, 670], \"label\": \"Text\"}, {\"bbox_2d\": [275, 625, 355, 643], \"label\": \"Formula\"}, {\"bbox_2d\": [274, 679, 381, 696], \"label\": \"Formula\"}, {\"bbox_2d\": [155, 731, 537, 846], \"label\": \"Formula\"}, {\"bbox_2d\": [250, 950, 384, 968], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [59, 154, 260, 168], \"label\": \"Section-header\"}, {\"bbox_2d\": [59, 154, 260, 168], \"label\": \"Section-header\"}, {\"bbox_2d\": [320, 47, 892, 86], \"label\": \"Section-header\"}, {\"bbox_2d\": [320, 105, 880, 133], \"label\": \"Text\"}, {\"bbox_2d\": [320, 146, 604, 268], \"label\": \"Text\"}, {\"bbox_2d\": [446, 282, 899, 374], \"label\": \"Text\"}, {\"bbox_2d\": [549, 418, 895, 515], \"label\": \"Text\"}, {\"bbox_2d\": [526, 154, 897, 203], \"label\": \"Table\"}, {\"bbox_2d\": [640, 392, 825, 416], \"label\": \"Section-header\"}, {\"bbox_2d\": [640, 392, 825, 416], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 417, 897, 429], \"label\": \"Text\"}, {\"bbox_2d\": [91, 289, 418, 305], \"label\": \"Section-header\"}, {\"bbox_2d\": [554, 808, 678, 826], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 683, 215, 708], \"label\": \"Picture\"}, {\"bbox_2d\": [89, 988, 109, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [90, 313, 418, 425], \"label\": \"Picture\"}, {\"bbox_2d\": [597, 735, 616, 749], \"label\": \"Picture\"}, {\"bbox_2d\": [125, 549, 1035, 997], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [0, 0, 275, 143], \"label\": \"Picture\"}, {\"bbox_2d\": [319, 106, 881, 133], \"label\": \"Text\"}, {\"bbox_2d\": [319, 148, 609, 269], \"label\": \"Text\"}, {\"bbox_2d\": [446, 284, 588, 327], \"label\": \"Text\"}, {\"bbox_2d\": [446, 331, 899, 374], \"label\": \"Text\"}, {\"bbox_2d\": [629, 153, 766, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [319, 47, 893, 88], \"label\": \"Section-header\"}, {\"bbox_2d\": [59, 158, 265, 168], \"label\": \"Section-header\"}, {\"bbox_2d\": [641, 396, 851, 416], \"label\": \"Section-header\"}, {\"bbox_2d\": [90, 289, 423, 480], \"label\": \"Picture\"}, {\"bbox_2d\": [629, 164, 905, 317], \"label\": \"Table\"}, {\"bbox_2d\": [640, 420, 916, 513], \"label\": \"Table\"}, {\"bbox_2d\": [93, 990, 108, 998], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [294, 83, 926, 145], \"label\": \"Text\"}, {\"bbox_2d\": [294, 185, 922, 262], \"label\": \"Text\"}, {\"bbox_2d\": [294, 279, 638, 293], \"label\": \"Text\"}, {\"bbox_2d\": [294, 864, 917, 925], \"label\": \"Text\"}, {\"bbox_2d\": [610, 979, 932, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [294, 837, 511, 848], \"label\": \"Caption\"}, {\"bbox_2d\": [217, 162, 359, 181], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 321, 538, 433], \"label\": \"Picture\"}, {\"bbox_2d\": [582, 320, 824, 434], \"label\": \"Picture\"}, {\"bbox_2d\": [582, 451, 824, 564], \"label\": \"Picture\"}, {\"bbox_2d\": [582, 582, 824, 694], \"label\": \"Picture\"}, {\"bbox_2d\": [294, 451, 538, 564], \"label\": \"Picture\"}, {\"bbox_2d\": [294, 582, 538, 694], \"label\": \"Picture\"}, {\"bbox_2d\": [294, 709, 538, 823], \"label\": \"Picture\"}, {\"bbox_2d\": [582, 709, 824, 823], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 923, 145], \"label\": \"Text\"}, {\"bbox_2d\": [293, 187, 915, 263], \"label\": \"Text\"}, {\"bbox_2d\": [293, 281, 637, 294], \"label\": \"Text\"}, {\"bbox_2d\": [293, 865, 912, 925], \"label\": \"Text\"}, {\"bbox_2d\": [293, 838, 512, 848], \"label\": \"Caption\"}, {\"bbox_2d\": [610, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 318, 825, 823], \"label\": \"Picture\"}, {\"bbox_2d\": [217, 163, 359, 180], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 926, 121], \"label\": \"Text\"}, {\"bbox_2d\": [231, 132, 911, 190], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 617, 923, 659], \"label\": \"List-item\"}, {\"bbox_2d\": [266, 546, 914, 591], \"label\": \"Text\"}, {\"bbox_2d\": [231, 499, 608, 509], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 934, 645, 945], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 679, 927, 930], \"label\": \"Picture\"}, {\"bbox_2d\": [229, 208, 820, 494], \"label\": \"Picture\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 925, 121], \"label\": \"Text\"}, {\"bbox_2d\": [231, 131, 911, 190], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 616, 923, 659], \"label\": \"List-item\"}, {\"bbox_2d\": [266, 547, 915, 590], \"label\": \"Text\"}, {\"bbox_2d\": [231, 935, 644, 945], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 498, 606, 509], \"label\": \"Caption\"}, {\"bbox_2d\": [230, 678, 925, 929], \"label\": \"Picture\"}, {\"bbox_2d\": [232, 209, 817, 495], \"label\": \"Picture\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [340, 40, 959, 73], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 104, 933, 147], \"label\": \"Text\"}, {\"bbox_2d\": [125, 164, 925, 192], \"label\": \"Text\"}, {\"bbox_2d\": [125, 209, 912, 237], \"label\": \"Text\"}, {\"bbox_2d\": [125, 269, 932, 387], \"label\": \"Text\"}, {\"bbox_2d\": [125, 453, 933, 498], \"label\": \"Text\"}, {\"bbox_2d\": [125, 514, 933, 558], \"label\": \"Text\"}, {\"bbox_2d\": [125, 574, 926, 660], \"label\": \"Text\"}, {\"bbox_2d\": [125, 677, 902, 721], \"label\": \"Text\"}, {\"bbox_2d\": [125, 737, 906, 765], \"label\": \"Text\"}, {\"bbox_2d\": [125, 782, 922, 826], \"label\": \"Text\"}, {\"bbox_2d\": [125, 842, 926, 886], \"label\": \"Text\"}, {\"bbox_2d\": [125, 902, 910, 975], \"label\": \"Text\"}, {\"bbox_2d\": [777, 990, 972, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [125, 399, 353, 417], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 425, 290, 440], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 248, 372, 262], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [125, 106, 932, 148], \"label\": \"Text\"}, {\"bbox_2d\": [125, 166, 936, 192], \"label\": \"Text\"}, {\"bbox_2d\": [125, 210, 913, 237], \"label\": \"Text\"}, {\"bbox_2d\": [125, 270, 952, 386], \"label\": \"Text\"}, {\"bbox_2d\": [124, 456, 934, 498], \"label\": \"Text\"}, {\"bbox_2d\": [124, 516, 932, 557], \"label\": \"Text\"}, {\"bbox_2d\": [124, 575, 938, 662], \"label\": \"Text\"}, {\"bbox_2d\": [124, 680, 901, 721], \"label\": \"Text\"}, {\"bbox_2d\": [124, 739, 920, 766], \"label\": \"Text\"}, {\"bbox_2d\": [124, 784, 935, 825], \"label\": \"Text\"}, {\"bbox_2d\": [124, 844, 924, 885], \"label\": \"Text\"}, {\"bbox_2d\": [124, 903, 921, 974], \"label\": \"Text\"}, {\"bbox_2d\": [778, 991, 971, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [126, 429, 293, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [126, 404, 354, 416], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 251, 371, 262], \"label\": \"Section-header\"}, {\"bbox_2d\": [128, 43, 958, 74], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [154, 91, 469, 107], \"label\": \"Text\"}, {\"bbox_2d\": [125, 176, 911, 220], \"label\": \"Text\"}, {\"bbox_2d\": [125, 241, 836, 255], \"label\": \"Text\"}, {\"bbox_2d\": [125, 325, 507, 339], \"label\": \"Text\"}, {\"bbox_2d\": [125, 615, 911, 712], \"label\": \"Text\"}, {\"bbox_2d\": [154, 721, 493, 737], \"label\": \"Text\"}, {\"bbox_2d\": [125, 804, 911, 900], \"label\": \"Text\"}, {\"bbox_2d\": [154, 916, 911, 930], \"label\": \"Text\"}, {\"bbox_2d\": [508, 972, 527, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [346, 147, 713, 180], \"label\": \"Formula\"}, {\"bbox_2d\": [396, 279, 644, 312], \"label\": \"Formula\"}, {\"bbox_2d\": [175, 360, 906, 623], \"label\": \"Formula\"}, {\"bbox_2d\": [347, 749, 713, 799], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [154, 92, 467, 105], \"label\": \"Text\"}, {\"bbox_2d\": [125, 174, 911, 217], \"label\": \"Text\"}, {\"bbox_2d\": [125, 242, 832, 255], \"label\": \"Text\"}, {\"bbox_2d\": [125, 324, 619, 338], \"label\": \"Text\"}, {\"bbox_2d\": [125, 612, 911, 710], \"label\": \"Text\"}, {\"bbox_2d\": [154, 724, 522, 737], \"label\": \"Text\"}, {\"bbox_2d\": [125, 806, 911, 910], \"label\": \"Text\"}, {\"bbox_2d\": [154, 904, 911, 937], \"label\": \"Text\"}, {\"bbox_2d\": [508, 974, 527, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [349, 118, 686, 169], \"label\": \"Formula\"}, {\"bbox_2d\": [398, 276, 638, 317], \"label\": \"Formula\"}, {\"bbox_2d\": [175, 361, 861, 589], \"label\": \"Formula\"}, {\"bbox_2d\": [346, 750, 689, 801], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [901, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [412, 95, 622, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 141, 911, 293], \"label\": \"Text\"}, {\"bbox_2d\": [121, 305, 911, 649], \"label\": \"Text\"}, {\"bbox_2d\": [121, 661, 911, 950], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [901, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [411, 97, 621, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 141, 911, 293], \"label\": \"Text\"}, {\"bbox_2d\": [121, 305, 911, 649], \"label\": \"Text\"}, {\"bbox_2d\": [121, 661, 911, 950], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 25, 351, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 96, 285, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 122, 774, 134], \"label\": \"Text\"}, {\"bbox_2d\": [91, 148, 460, 160], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 919, 936, 950], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 952, 318, 963], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 999, 446, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [89, 172, 952, 907], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 351, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 98, 285, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 125, 773, 135], \"label\": \"Text\"}, {\"bbox_2d\": [91, 149, 459, 160], \"label\": \"Section-header\"}, {\"bbox_2d\": [89, 175, 948, 911], \"label\": \"Table\"}, {\"bbox_2d\": [91, 1000, 444, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 918, 944, 949], \"label\": \"Text\"}, {\"bbox_2d\": [91, 953, 320, 961], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 45, 951, 120], \"label\": \"Text\"}, {\"bbox_2d\": [85, 134, 760, 149], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 152, 960, 260], \"label\": \"Text\"}, {\"bbox_2d\": [85, 274, 963, 477], \"label\": \"Text\"}, {\"bbox_2d\": [85, 492, 804, 507], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 509, 950, 586], \"label\": \"Text\"}, {\"bbox_2d\": [85, 644, 957, 688], \"label\": \"Text\"}, {\"bbox_2d\": [85, 699, 922, 733], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 626, 703, 641], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 600, 515, 614], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 736, 957, 859], \"label\": \"Text\"}, {\"bbox_2d\": [85, 874, 950, 919], \"label\": \"Text\"}, {\"bbox_2d\": [85, 999, 94, 1014], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [85, 58, 951, 120], \"label\": \"Text\"}, {\"bbox_2d\": [85, 149, 966, 259], \"label\": \"Text\"}, {\"bbox_2d\": [85, 288, 964, 478], \"label\": \"Text\"}, {\"bbox_2d\": [85, 506, 956, 585], \"label\": \"Text\"}, {\"bbox_2d\": [85, 641, 962, 688], \"label\": \"Text\"}, {\"bbox_2d\": [85, 732, 960, 859], \"label\": \"Text\"}, {\"bbox_2d\": [85, 871, 955, 918], \"label\": \"Text\"}, {\"bbox_2d\": [85, 997, 92, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 698, 926, 733], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 623, 707, 642], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 595, 515, 614], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 488, 803, 507], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 270, 964, 288], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 131, 759, 149], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 39, 951, 58], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [789, 74, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 74, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [159, 156, 434, 273], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 327, 353, 543], \"label\": \"Picture\"}, {\"bbox_2d\": [157, 589, 311, 732], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 776, 433, 902], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 156, 860, 457], \"label\": \"Picture\"}, {\"bbox_2d\": [552, 467, 837, 606], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 610, 836, 856], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 857, 815, 967], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [790, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [158, 156, 501, 727], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 773, 498, 905], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 153, 857, 458], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 465, 898, 830], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 838, 896, 966], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [506, 3, 894, 48], \"label\": \"Picture\"}, {\"bbox_2d\": [552, 61, 901, 99], \"label\": \"Table\"}, {\"bbox_2d\": [141, 54, 411, 76], \"label\": \"Title\"}, {\"bbox_2d\": [141, 83, 249, 99], \"label\": \"Text\"}, {\"bbox_2d\": [141, 127, 486, 165], \"label\": \"Title\"}, {\"bbox_2d\": [141, 179, 497, 202], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 218, 493, 241], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 258, 271, 269], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 280, 338, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 303, 430, 314], \"label\": \"Title\"}, {\"bbox_2d\": [156, 324, 501, 333], \"label\": \"Text\"}, {\"bbox_2d\": [141, 343, 501, 353], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 361, 501, 383], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 401, 501, 424], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 437, 374, 448], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 460, 413, 468], \"label\": \"List-item\"}, {\"bbox_2d\": [165, 479, 501, 489], \"label\": \"Text\"}, {\"bbox_2d\": [531, 417, 895, 456], \"label\": \"Text\"}, {\"bbox_2d\": [531, 298, 895, 325], \"label\": \"Text\"}, {\"bbox_2d\": [531, 353, 790, 379], \"label\": \"Text\"}, {\"bbox_2d\": [531, 392, 753, 402], \"label\": \"Section-header\"}, {\"bbox_2d\": [619, 479, 809, 489], \"label\": \"Section-header\"}, {\"bbox_2d\": [578, 199, 827, 209], \"label\": \"Section-header\"}, {\"bbox_2d\": [552, 127, 895, 182], \"label\": \"Table\"}, {\"bbox_2d\": [552, 225, 769, 250], \"label\": \"Table\"}, {\"bbox_2d\": [648, 41, 757, 50], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [503, 6, 896, 37], \"label\": \"Picture\"}, {\"bbox_2d\": [645, 39, 760, 50], \"label\": \"Text\"}, {\"bbox_2d\": [139, 55, 454, 75], \"label\": \"Title\"}, {\"bbox_2d\": [140, 83, 249, 98], \"label\": \"Title\"}, {\"bbox_2d\": [552, 59, 901, 75], \"label\": \"Title\"}, {\"bbox_2d\": [551, 84, 901, 98], \"label\": \"Title\"}, {\"bbox_2d\": [140, 126, 489, 163], \"label\": \"Title\"}, {\"bbox_2d\": [140, 178, 502, 202], \"label\": \"List-item\"}, {\"bbox_2d\": [140, 216, 496, 240], \"label\": \"List-item\"}, {\"bbox_2d\": [140, 258, 300, 269], \"label\": \"List-item\"}, {\"bbox_2d\": [140, 280, 341, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [139, 340, 504, 353], \"label\": \"List-item\"}, {\"bbox_2d\": [139, 357, 504, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [139, 398, 504, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [140, 301, 433, 312], \"label\": \"Section-header\"}, {\"bbox_2d\": [139, 436, 377, 447], \"label\": \"Section-header\"}, {\"bbox_2d\": [227, 458, 415, 468], \"label\": \"Section-header\"}, {\"bbox_2d\": [618, 476, 811, 487], \"label\": \"Section-header\"}, {\"bbox_2d\": [531, 391, 757, 402], \"label\": \"Section-header\"}, {\"bbox_2d\": [631, 268, 799, 278], \"label\": \"Section-header\"}, {\"bbox_2d\": [600, 198, 829, 209], \"label\": \"Section-header\"}, {\"bbox_2d\": [556, 125, 899, 180], \"label\": \"Table\"}, {\"bbox_2d\": [155, 322, 503, 332], \"label\": \"Text\"}, {\"bbox_2d\": [165, 478, 502, 488], \"label\": \"Text\"}, {\"bbox_2d\": [564, 223, 744, 233], \"label\": \"Text\"}, {\"bbox_2d\": [565, 237, 786, 247], \"label\": \"Text\"}, {\"bbox_2d\": [530, 299, 898, 324], \"label\": \"Text\"}, {\"bbox_2d\": [530, 352, 794, 363], \"label\": \"Text\"}, {\"bbox_2d\": [530, 368, 786, 378], \"label\": \"Text\"}, {\"bbox_2d\": [530, 417, 899, 456], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [736, 60, 901, 72], \"label\": \"Page-header\"}, {\"bbox_2d\": [460, 73, 622, 85], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 133, 583, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [183, 179, 387, 191], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 33, 291, 122], \"label\": \"Picture\"}, {\"bbox_2d\": [828, 84, 903, 121], \"label\": \"Picture\"}, {\"bbox_2d\": [211, 298, 802, 777], \"label\": \"Picture\"}, {\"bbox_2d\": [886, 974, 900, 986], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [886, 973, 900, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [109, 31, 294, 122], \"label\": \"Picture\"}, {\"bbox_2d\": [179, 176, 387, 191], \"label\": \"List-item\"}, {\"bbox_2d\": [458, 72, 627, 86], \"label\": \"Text\"}, {\"bbox_2d\": [828, 83, 902, 121], \"label\": \"Picture\"}, {\"bbox_2d\": [735, 60, 905, 71], \"label\": \"Page-header\"}, {\"bbox_2d\": [496, 131, 590, 147], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 92, 523, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [787, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 145, 349, 155], \"label\": \"Text\"}, {\"bbox_2d\": [677, 145, 746, 155], \"label\": \"Text\"}, {\"bbox_2d\": [130, 359, 501, 466], \"label\": \"Text\"}, {\"bbox_2d\": [171, 189, 349, 331], \"label\": \"Picture\"}, {\"bbox_2d\": [530, 154, 883, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [165, 523, 460, 636], \"label\": \"Picture\"}, {\"bbox_2d\": [165, 671, 460, 802], \"label\": \"Picture\"}, {\"bbox_2d\": [171, 840, 460, 968], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [786, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 92, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 359, 500, 465], \"label\": \"Text\"}, {\"bbox_2d\": [167, 527, 459, 967], \"label\": \"Picture\"}, {\"bbox_2d\": [258, 142, 372, 332], \"label\": \"Picture\"}, {\"bbox_2d\": [521, 140, 892, 970], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 951, 98], \"label\": \"Text\"}, {\"bbox_2d\": [91, 71, 506, 98], \"label\": \"Text\"}, {\"bbox_2d\": [91, 101, 506, 280], \"label\": \"Text\"}, {\"bbox_2d\": [91, 320, 506, 390], \"label\": \"Text\"}, {\"bbox_2d\": [91, 394, 506, 417], \"label\": \"Text\"}, {\"bbox_2d\": [536, 158, 951, 333], \"label\": \"Text\"}, {\"bbox_2d\": [91, 294, 498, 305], \"label\": \"Text\"}, {\"bbox_2d\": [91, 476, 234, 487], \"label\": \"Text\"}, {\"bbox_2d\": [536, 440, 951, 513], \"label\": \"Text\"}, {\"bbox_2d\": [91, 529, 506, 573], \"label\": \"Text\"}, {\"bbox_2d\": [91, 612, 506, 681], \"label\": \"Text\"}, {\"bbox_2d\": [91, 699, 506, 726], \"label\": \"Text\"}, {\"bbox_2d\": [536, 638, 951, 717], \"label\": \"Text\"}, {\"bbox_2d\": [91, 766, 506, 853], \"label\": \"Text\"}, {\"bbox_2d\": [536, 744, 951, 785], \"label\": \"Text\"}, {\"bbox_2d\": [91, 918, 506, 945], \"label\": \"Text\"}, {\"bbox_2d\": [536, 833, 908, 844], \"label\": \"Text\"}, {\"bbox_2d\": [536, 546, 930, 585], \"label\": \"Section-header\"}, {\"bbox_2d\": [625, 604, 842, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [567, 802, 951, 818], \"label\": \"Formula\"}, {\"bbox_2d\": [656, 905, 829, 921], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 935, 937, 945], \"label\": \"Text\"}, {\"bbox_2d\": [91, 875, 490, 905], \"label\": \"Formula\"}, {\"bbox_2d\": [227, 741, 374, 752], \"label\": \"Formula\"}, {\"bbox_2d\": [205, 583, 370, 600], \"label\": \"Formula\"}, {\"bbox_2d\": [219, 440, 506, 460], \"label\": \"Formula\"}, {\"bbox_2d\": [598, 108, 892, 160], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 351, 951, 392], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [943, 38, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 72, 951, 98], \"label\": \"Text\"}, {\"bbox_2d\": [91, 72, 506, 98], \"label\": \"Text\"}, {\"bbox_2d\": [91, 102, 506, 278], \"label\": \"Text\"}, {\"bbox_2d\": [536, 157, 951, 334], \"label\": \"Text\"}, {\"bbox_2d\": [91, 320, 506, 392], \"label\": \"Text\"}, {\"bbox_2d\": [596, 110, 891, 141], \"label\": \"Formula\"}, {\"bbox_2d\": [100, 293, 497, 305], \"label\": \"Formula\"}, {\"bbox_2d\": [216, 435, 506, 464], \"label\": \"Formula\"}, {\"bbox_2d\": [658, 408, 830, 423], \"label\": \"Formula\"}, {\"bbox_2d\": [225, 582, 371, 597], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 501, 392, 513], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 351, 951, 392], \"label\": \"Text\"}, {\"bbox_2d\": [91, 395, 506, 422], \"label\": \"Text\"}, {\"bbox_2d\": [91, 474, 235, 486], \"label\": \"Text\"}, {\"bbox_2d\": [91, 529, 506, 570], \"label\": \"Text\"}, {\"bbox_2d\": [536, 440, 951, 512], \"label\": \"Text\"}, {\"bbox_2d\": [91, 610, 506, 684], \"label\": \"Text\"}, {\"bbox_2d\": [536, 638, 951, 740], \"label\": \"Text\"}, {\"bbox_2d\": [91, 699, 506, 726], \"label\": \"Text\"}, {\"bbox_2d\": [536, 743, 951, 785], \"label\": \"Text\"}, {\"bbox_2d\": [91, 768, 506, 854], \"label\": \"Text\"}, {\"bbox_2d\": [557, 546, 930, 583], \"label\": \"Section-header\"}, {\"bbox_2d\": [647, 605, 840, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [578, 801, 951, 813], \"label\": \"Formula\"}, {\"bbox_2d\": [225, 738, 371, 753], \"label\": \"Formula\"}, {\"bbox_2d\": [660, 903, 827, 916], \"label\": \"Formula\"}, {\"bbox_2d\": [108, 873, 489, 904], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 831, 908, 843], \"label\": \"Text\"}, {\"bbox_2d\": [536, 860, 951, 887], \"label\": \"Text\"}, {\"bbox_2d\": [536, 934, 937, 946], \"label\": \"Text\"}, {\"bbox_2d\": [91, 919, 506, 946], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [407, 709, 971, 731], \"label\": \"Title\"}, {\"bbox_2d\": [870, 555, 962, 635], \"label\": \"Picture\"}, {\"bbox_2d\": [0, 0, 1035, 1035], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [0, 781, 342, 1035], \"label\": \"Picture\"}, {\"bbox_2d\": [1, 521, 342, 776], \"label\": \"Picture\"}, {\"bbox_2d\": [0, 262, 342, 515], \"label\": \"Picture\"}, {\"bbox_2d\": [348, 262, 685, 519], \"label\": \"Picture\"}, {\"bbox_2d\": [691, 261, 1035, 518], \"label\": \"Picture\"}, {\"bbox_2d\": [3, 0, 344, 259], \"label\": \"Picture\"}, {\"bbox_2d\": [344, -1, 687, 258], \"label\": \"Picture\"}, {\"bbox_2d\": [693, -1, 1035, 255], \"label\": \"Picture\"}, {\"bbox_2d\": [409, 710, 971, 729], \"label\": \"Text\"}, {\"bbox_2d\": [400, 625, 980, 706], \"label\": \"Title\"}, {\"bbox_2d\": [348, 778, 687, 1035], \"label\": \"Picture\"}, {\"bbox_2d\": [695, 778, 1035, 1035], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [771, 77, 942, 108], \"label\": \"Page-header\"}, {\"bbox_2d\": [329, 116, 707, 150], \"label\": \"Caption\"}, {\"bbox_2d\": [333, 929, 694, 948], \"label\": \"Caption\"}, {\"bbox_2d\": [338, 962, 678, 976], \"label\": \"Caption\"}, {\"bbox_2d\": [500, 997, 517, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [118, 108, 909, 957], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [115, 108, 911, 956], \"label\": \"Picture\"}, {\"bbox_2d\": [336, 959, 679, 975], \"label\": \"Caption\"}, {\"bbox_2d\": [776, 81, 953, 115], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [81, 110, 486, 217], \"label\": \"Text\"}, {\"bbox_2d\": [81, 1004, 246, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [532, 484, 924, 527], \"label\": \"Text\"}, {\"bbox_2d\": [532, 466, 895, 481], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 438, 625, 453], \"label\": \"Section-header\"}, {\"bbox_2d\": [532, 123, 954, 396], \"label\": \"Table\"}, {\"bbox_2d\": [532, 548, 954, 739], \"label\": \"Table\"}, {\"bbox_2d\": [81, 225, 499, 924], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [80, 226, 505, 926], \"label\": \"Table\"}, {\"bbox_2d\": [532, 534, 956, 742], \"label\": \"Table\"}, {\"bbox_2d\": [530, 113, 957, 398], \"label\": \"Table\"}, {\"bbox_2d\": [81, 112, 492, 216], \"label\": \"Text\"}, {\"bbox_2d\": [533, 434, 625, 452], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 467, 895, 480], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 485, 928, 527], \"label\": \"Text\"}, {\"bbox_2d\": [81, 1005, 94, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [111, 1005, 246, 1013], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [83, 73, 508, 115], \"label\": \"Text\"}, {\"bbox_2d\": [83, 128, 508, 456], \"label\": \"Text\"}, {\"bbox_2d\": [83, 469, 508, 770], \"label\": \"Text\"}, {\"bbox_2d\": [83, 783, 508, 979], \"label\": \"Text\"}, {\"bbox_2d\": [528, 73, 953, 115], \"label\": \"Text\"}, {\"bbox_2d\": [528, 182, 953, 274], \"label\": \"Text\"}, {\"bbox_2d\": [528, 287, 953, 588], \"label\": \"Text\"}, {\"bbox_2d\": [528, 601, 953, 979], \"label\": \"Text\"}, {\"bbox_2d\": [667, 150, 813, 162], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [947, 39, 953, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [82, 76, 507, 114], \"label\": \"Text\"}, {\"bbox_2d\": [82, 130, 507, 454], \"label\": \"Text\"}, {\"bbox_2d\": [82, 470, 507, 769], \"label\": \"Text\"}, {\"bbox_2d\": [82, 785, 507, 979], \"label\": \"Text\"}, {\"bbox_2d\": [528, 76, 953, 114], \"label\": \"Text\"}, {\"bbox_2d\": [528, 184, 953, 274], \"label\": \"Text\"}, {\"bbox_2d\": [528, 289, 953, 587], \"label\": \"Text\"}, {\"bbox_2d\": [528, 602, 953, 979], \"label\": \"Text\"}, {\"bbox_2d\": [667, 150, 813, 162], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [122, 34, 411, 88], \"label\": \"Picture\"}, {\"bbox_2d\": [127, 141, 906, 183], \"label\": \"Text\"}, {\"bbox_2d\": [141, 201, 906, 244], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 262, 835, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 291, 683, 302], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 354, 906, 457], \"label\": \"Text\"}, {\"bbox_2d\": [133, 324, 390, 335], \"label\": \"Section-header\"}, {\"bbox_2d\": [126, 473, 912, 940], \"label\": \"Table\"}, {\"bbox_2d\": [533, 971, 552, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [103, 32, 426, 98], \"label\": \"Picture\"}, {\"bbox_2d\": [172, 139, 908, 183], \"label\": \"Text\"}, {\"bbox_2d\": [141, 199, 908, 243], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 258, 841, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 288, 690, 303], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 321, 396, 335], \"label\": \"Section-header\"}, {\"bbox_2d\": [133, 353, 908, 457], \"label\": \"Text\"}, {\"bbox_2d\": [124, 472, 905, 947], \"label\": \"Table\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 93, 354, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 127, 907, 171], \"label\": \"Text\"}, {\"bbox_2d\": [231, 501, 892, 529], \"label\": \"Text\"}, {\"bbox_2d\": [231, 545, 910, 590], \"label\": \"Text\"}, {\"bbox_2d\": [231, 892, 892, 950], \"label\": \"Text\"}, {\"bbox_2d\": [231, 857, 472, 868], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 467, 505, 478], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 608, 923, 854], \"label\": \"Picture\"}, {\"bbox_2d\": [229, 188, 792, 461], \"label\": \"Picture\"}, {\"bbox_2d\": [717, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 93, 355, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 127, 924, 170], \"label\": \"Text\"}, {\"bbox_2d\": [231, 501, 897, 529], \"label\": \"Text\"}, {\"bbox_2d\": [231, 545, 914, 589], \"label\": \"Text\"}, {\"bbox_2d\": [231, 467, 505, 478], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 857, 472, 868], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 891, 894, 950], \"label\": \"Text\"}, {\"bbox_2d\": [232, 190, 791, 460], \"label\": \"Picture\"}, {\"bbox_2d\": [229, 607, 921, 854], \"label\": \"Picture\"}, {\"bbox_2d\": [717, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [909, 47, 922, 64], \"label\": \"Page-header\"}, {\"bbox_2d\": [311, 96, 721, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 141, 911, 183], \"label\": \"Text\"}, {\"bbox_2d\": [121, 316, 168, 337], \"label\": \"Text\"}, {\"bbox_2d\": [121, 354, 911, 486], \"label\": \"Text\"}, {\"bbox_2d\": [121, 555, 911, 654], \"label\": \"Text\"}, {\"bbox_2d\": [121, 723, 911, 765], \"label\": \"Text\"}, {\"bbox_2d\": [121, 778, 911, 957], \"label\": \"Text\"}, {\"bbox_2d\": [239, 201, 911, 305], \"label\": \"Formula\"}, {\"bbox_2d\": [264, 346, 911, 368], \"label\": \"Formula\"}, {\"bbox_2d\": [239, 479, 911, 528], \"label\": \"Formula\"}, {\"bbox_2d\": [393, 565, 911, 614], \"label\": \"Formula\"}, {\"bbox_2d\": [393, 677, 911, 704], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [901, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [312, 96, 720, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 137, 910, 184], \"label\": \"Text\"}, {\"bbox_2d\": [271, 202, 911, 316], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 320, 172, 336], \"label\": \"Text\"}, {\"bbox_2d\": [292, 345, 911, 374], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 380, 911, 454], \"label\": \"Text\"}, {\"bbox_2d\": [239, 479, 911, 508], \"label\": \"Formula\"}, {\"bbox_2d\": [383, 510, 911, 539], \"label\": \"Formula\"}, {\"bbox_2d\": [392, 681, 911, 705], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 554, 911, 653], \"label\": \"Text\"}, {\"bbox_2d\": [121, 724, 911, 767], \"label\": \"Text\"}, {\"bbox_2d\": [121, 778, 911, 958], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [56, 168, 250, 229], \"label\": \"Text\"}, {\"bbox_2d\": [293, 168, 493, 229], \"label\": \"Text\"}, {\"bbox_2d\": [533, 168, 732, 261], \"label\": \"Text\"}, {\"bbox_2d\": [533, 336, 650, 384], \"label\": \"Text\"}, {\"bbox_2d\": [56, 258, 262, 413], \"label\": \"Text\"}, {\"bbox_2d\": [56, 241, 213, 253], \"label\": \"Section-header\"}, {\"bbox_2d\": [56, 674, 204, 715], \"label\": \"Caption\"}, {\"bbox_2d\": [60, 283, 1034, 1034], \"label\": \"Picture\"}, {\"bbox_2d\": [814, 997, 1001, 1020], \"label\": \"Page-footer\"}, {\"bbox_2d\": [529, 283, 745, 324], \"label\": \"Picture\"}, {\"bbox_2d\": [533, 336, 649, 365], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [55, 170, 250, 228], \"label\": \"Text\"}, {\"bbox_2d\": [55, 258, 263, 413], \"label\": \"Text\"}, {\"bbox_2d\": [293, 170, 494, 229], \"label\": \"Text\"}, {\"bbox_2d\": [293, 242, 505, 396], \"label\": \"Text\"}, {\"bbox_2d\": [532, 170, 732, 260], \"label\": \"Text\"}, {\"bbox_2d\": [532, 338, 650, 383], \"label\": \"Caption\"}, {\"bbox_2d\": [55, 243, 215, 253], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 281, 748, 324], \"label\": \"Picture\"}, {\"bbox_2d\": [55, 675, 204, 714], \"label\": \"Text\"}, {\"bbox_2d\": [865, 1003, 999, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [273, 84, 805, 97], \"label\": \"Text\"}, {\"bbox_2d\": [273, 424, 880, 438], \"label\": \"Text\"}, {\"bbox_2d\": [273, 456, 651, 469], \"label\": \"Text\"}, {\"bbox_2d\": [273, 799, 876, 828], \"label\": \"Text\"}, {\"bbox_2d\": [273, 847, 901, 938], \"label\": \"Text\"}, {\"bbox_2d\": [273, 109, 636, 121], \"label\": \"Caption\"}, {\"bbox_2d\": [273, 122, 913, 396], \"label\": \"Table\"}, {\"bbox_2d\": [294, 483, 822, 751], \"label\": \"Text\"}, {\"bbox_2d\": [273, 473, 910, 773], \"label\": \"Picture\"}, {\"bbox_2d\": [99, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 806, 98], \"label\": \"Text\"}, {\"bbox_2d\": [272, 110, 638, 121], \"label\": \"Caption\"}, {\"bbox_2d\": [272, 122, 913, 397], \"label\": \"Table\"}, {\"bbox_2d\": [273, 423, 880, 439], \"label\": \"Text\"}, {\"bbox_2d\": [273, 457, 650, 470], \"label\": \"Text\"}, {\"bbox_2d\": [273, 800, 876, 829], \"label\": \"Text\"}, {\"bbox_2d\": [273, 847, 900, 939], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [310, 504, 615, 541], \"label\": \"Table\"}, {\"bbox_2d\": [310, 574, 817, 716], \"label\": \"Table\"}, {\"bbox_2d\": [297, 552, 472, 563], \"label\": \"Text\"}, {\"bbox_2d\": [472, 481, 648, 493], \"label\": \"Section-header\"}, {\"bbox_2d\": [776, 728, 824, 740], \"label\": \"Text\"}, {\"bbox_2d\": [297, 740, 601, 751], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 914, 121], \"label\": \"Text\"}, {\"bbox_2d\": [231, 182, 892, 194], \"label\": \"Text\"}, {\"bbox_2d\": [231, 287, 913, 332], \"label\": \"Text\"}, {\"bbox_2d\": [231, 347, 895, 375], \"label\": \"Text\"}, {\"bbox_2d\": [231, 508, 926, 568], \"label\": \"Text\"}, {\"bbox_2d\": [231, 584, 923, 629], \"label\": \"Text\"}, {\"bbox_2d\": [231, 827, 926, 886], \"label\": \"Text\"}, {\"bbox_2d\": [231, 902, 919, 947], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 699, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 637, 926, 665], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 676, 817, 688], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 692, 784, 704], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 730, 888, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 767, 908, 811], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 439, 926, 467], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 384, 915, 429], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 244, 926, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 204, 926, 232], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 148, 399, 164], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 488, 826, 503], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 914, 121], \"label\": \"Text\"}, {\"bbox_2d\": [231, 182, 888, 194], \"label\": \"Text\"}, {\"bbox_2d\": [231, 287, 914, 330], \"label\": \"Text\"}, {\"bbox_2d\": [231, 347, 896, 375], \"label\": \"Text\"}, {\"bbox_2d\": [231, 508, 926, 567], \"label\": \"Text\"}, {\"bbox_2d\": [231, 584, 922, 627], \"label\": \"Text\"}, {\"bbox_2d\": [231, 827, 926, 886], \"label\": \"Text\"}, {\"bbox_2d\": [231, 903, 919, 946], \"label\": \"Text\"}, {\"bbox_2d\": [231, 767, 908, 810], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 729, 888, 757], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 637, 926, 665], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 675, 823, 687], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 691, 808, 703], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 707, 790, 719], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 438, 926, 466], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 385, 914, 428], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 242, 926, 270], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 204, 925, 232], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 148, 394, 163], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 488, 822, 502], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 27, 211, 43], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 44, 438, 61], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 93, 322, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 63, 627, 77], \"label\": \"Text\"}, {\"bbox_2d\": [85, 109, 427, 122], \"label\": \"Text\"}, {\"bbox_2d\": [85, 328, 953, 423], \"label\": \"Text\"}, {\"bbox_2d\": [85, 436, 418, 450], \"label\": \"Text\"}, {\"bbox_2d\": [85, 588, 957, 652], \"label\": \"Text\"}, {\"bbox_2d\": [85, 304, 601, 315], \"label\": \"Text\"}, {\"bbox_2d\": [85, 136, 964, 303], \"label\": \"Table\"}, {\"bbox_2d\": [86, 463, 965, 573], \"label\": \"Table\"}, {\"bbox_2d\": [85, 664, 965, 718], \"label\": \"Table\"}, {\"bbox_2d\": [85, 994, 99, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [85, 22, 628, 78], \"label\": \"Page-header\"}, {\"bbox_2d\": [85, 90, 321, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 995, 100, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 588, 961, 650], \"label\": \"Text\"}, {\"bbox_2d\": [85, 328, 956, 423], \"label\": \"Text\"}, {\"bbox_2d\": [85, 435, 420, 451], \"label\": \"Text\"}, {\"bbox_2d\": [85, 303, 601, 315], \"label\": \"Text\"}, {\"bbox_2d\": [85, 108, 428, 123], \"label\": \"Text\"}, {\"bbox_2d\": [82, 132, 966, 301], \"label\": \"Table\"}, {\"bbox_2d\": [84, 459, 966, 573], \"label\": \"Table\"}, {\"bbox_2d\": [82, 659, 967, 717], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [663, 222, 781, 243], \"label\": \"Caption\"}, {\"bbox_2d\": [148, 194, 548, 217], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 295, 511, 427], \"label\": \"Text\"}, {\"bbox_2d\": [148, 434, 494, 548], \"label\": \"Text\"}, {\"bbox_2d\": [148, 572, 511, 669], \"label\": \"Text\"}, {\"bbox_2d\": [148, 676, 511, 824], \"label\": \"Text\"}, {\"bbox_2d\": [148, 831, 511, 979], \"label\": \"Text\"}, {\"bbox_2d\": [548, 831, 906, 959], \"label\": \"Text\"}, {\"bbox_2d\": [548, 623, 906, 823], \"label\": \"Text\"}, {\"bbox_2d\": [548, 400, 910, 618], \"label\": \"Text\"}, {\"bbox_2d\": [548, 295, 901, 392], \"label\": \"Text\"}, {\"bbox_2d\": [814, 1003, 985, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [148, 120, 381, 162], \"label\": \"Page-header\"}, {\"bbox_2d\": [666, 1, 911, 214], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [666, 223, 783, 242], \"label\": \"Caption\"}, {\"bbox_2d\": [148, 297, 512, 427], \"label\": \"Text\"}, {\"bbox_2d\": [148, 434, 493, 548], \"label\": \"Text\"}, {\"bbox_2d\": [548, 297, 902, 395], \"label\": \"Text\"}, {\"bbox_2d\": [548, 400, 913, 619], \"label\": \"Text\"}, {\"bbox_2d\": [148, 572, 510, 671], \"label\": \"Text\"}, {\"bbox_2d\": [148, 676, 512, 826], \"label\": \"Text\"}, {\"bbox_2d\": [148, 831, 511, 981], \"label\": \"Text\"}, {\"bbox_2d\": [548, 624, 910, 826], \"label\": \"Text\"}, {\"bbox_2d\": [548, 831, 908, 963], \"label\": \"Text\"}, {\"bbox_2d\": [813, 1002, 985, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [1014, 581, 1024, 633], \"label\": \"Page-header\"}, {\"bbox_2d\": [148, 123, 381, 134], \"label\": \"Section-header\"}, {\"bbox_2d\": [666, 0, 914, 214], \"label\": \"Picture\"}, {\"bbox_2d\": [148, 196, 549, 217], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [323, 38, 686, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 75, 506, 253], \"label\": \"Text\"}, {\"bbox_2d\": [510, 75, 903, 525], \"label\": \"Text\"}, {\"bbox_2d\": [510, 529, 903, 929], \"label\": \"Text\"}, {\"bbox_2d\": [91, 257, 506, 600], \"label\": \"Text\"}, {\"bbox_2d\": [91, 603, 506, 931], \"label\": \"Text\"}, {\"bbox_2d\": [106, 935, 481, 948], \"label\": \"Text\"}, {\"bbox_2d\": [87, 974, 217, 986], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [939, 38, 948, 51], \"label\": \"Page-header\"}, {\"bbox_2d\": [321, 38, 715, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 76, 481, 254], \"label\": \"Text\"}, {\"bbox_2d\": [511, 76, 904, 524], \"label\": \"Text\"}, {\"bbox_2d\": [88, 257, 481, 600], \"label\": \"Text\"}, {\"bbox_2d\": [511, 530, 904, 948], \"label\": \"Text\"}, {\"bbox_2d\": [88, 603, 480, 931], \"label\": \"Text\"}, {\"bbox_2d\": [105, 935, 480, 948], \"label\": \"Text\"}, {\"bbox_2d\": [88, 974, 216, 984], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [72, 43, 350, 68], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 101, 398, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 118, 183, 129], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 68, 274, 79], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 127, 916, 346], \"label\": \"Table\"}, {\"bbox_2d\": [72, 349, 610, 358], \"label\": \"Text\"}, {\"bbox_2d\": [111, 360, 422, 369], \"label\": \"Text\"}, {\"bbox_2d\": [72, 401, 178, 410], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 719, 431, 730], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 415, 917, 629], \"label\": \"Table\"}, {\"bbox_2d\": [72, 632, 610, 641], \"label\": \"Text\"}, {\"bbox_2d\": [111, 643, 422, 652], \"label\": \"Text\"}, {\"bbox_2d\": [72, 654, 902, 673], \"label\": \"Text\"}, {\"bbox_2d\": [72, 805, 443, 814], \"label\": \"Text\"}, {\"bbox_2d\": [111, 815, 362, 824], \"label\": \"Text\"}, {\"bbox_2d\": [111, 825, 553, 834], \"label\": \"Text\"}, {\"bbox_2d\": [72, 739, 917, 803], \"label\": \"Table\"}, {\"bbox_2d\": [72, 999, 169, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [39, 17, 83, 29], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [37, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 47, 350, 67], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 70, 275, 78], \"label\": \"Text\"}, {\"bbox_2d\": [73, 102, 398, 112], \"label\": \"Text\"}, {\"bbox_2d\": [71, 119, 918, 346], \"label\": \"Table\"}, {\"bbox_2d\": [69, 400, 917, 627], \"label\": \"Table\"}, {\"bbox_2d\": [70, 738, 918, 800], \"label\": \"Table\"}, {\"bbox_2d\": [73, 720, 432, 730], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [72, 805, 445, 813], \"label\": \"Text\"}, {\"bbox_2d\": [111, 816, 364, 824], \"label\": \"Text\"}, {\"bbox_2d\": [111, 826, 554, 834], \"label\": \"Text\"}, {\"bbox_2d\": [72, 631, 611, 639], \"label\": \"Text\"}, {\"bbox_2d\": [111, 642, 422, 650], \"label\": \"Text\"}, {\"bbox_2d\": [111, 652, 907, 671], \"label\": \"Text\"}, {\"bbox_2d\": [72, 350, 611, 358], \"label\": \"Text\"}, {\"bbox_2d\": [111, 360, 422, 369], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [103, 47, 275, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [254, 89, 879, 964], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [102, 50, 287, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [251, 90, 886, 966], \"label\": \"Table\"}, {\"bbox_2d\": [528, 975, 544, 990], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [186, 89, 885, 132], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 154, 349, 167], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 178, 883, 206], \"label\": \"Text\"}, {\"bbox_2d\": [685, 237, 883, 281], \"label\": \"Text\"}, {\"bbox_2d\": [156, 268, 398, 281], \"label\": \"Text\"}, {\"bbox_2d\": [390, 349, 648, 360], \"label\": \"Text\"}, {\"bbox_2d\": [156, 374, 884, 461], \"label\": \"Text\"}, {\"bbox_2d\": [156, 491, 300, 501], \"label\": \"Text\"}, {\"bbox_2d\": [156, 509, 884, 533], \"label\": \"Text\"}, {\"bbox_2d\": [156, 923, 338, 932], \"label\": \"Page-footer\"}, {\"bbox_2d\": [156, 961, 390, 970], \"label\": \"Page-footer\"}, {\"bbox_2d\": [783, 869, 971, 988], \"label\": \"Picture\"}, {\"bbox_2d\": [156, 879, 198, 890], \"label\": \"Text\"}, {\"bbox_2d\": [419, 327, 623, 339], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [156, 180, 884, 207], \"label\": \"Text\"}, {\"bbox_2d\": [597, 239, 881, 281], \"label\": \"Text\"}, {\"bbox_2d\": [156, 269, 397, 281], \"label\": \"Text\"}, {\"bbox_2d\": [156, 375, 884, 461], \"label\": \"Text\"}, {\"bbox_2d\": [156, 492, 300, 501], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 511, 883, 533], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 155, 349, 167], \"label\": \"Section-header\"}, {\"bbox_2d\": [419, 328, 621, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [391, 350, 650, 362], \"label\": \"Section-header\"}, {\"bbox_2d\": [186, 91, 884, 132], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 880, 198, 891], \"label\": \"Text\"}, {\"bbox_2d\": [155, 923, 337, 931], \"label\": \"Text\"}, {\"bbox_2d\": [155, 963, 390, 971], \"label\": \"Text\"}, {\"bbox_2d\": [782, 871, 969, 987], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [556, 22, 883, 41], \"label\": \"Text\"}, {\"bbox_2d\": [190, 493, 548, 522], \"label\": \"Text\"}, {\"bbox_2d\": [418, 583, 722, 655], \"label\": \"Text\"}, {\"bbox_2d\": [78, 583, 381, 800], \"label\": \"Text\"}, {\"bbox_2d\": [78, 843, 379, 947], \"label\": \"Text\"}, {\"bbox_2d\": [822, 719, 988, 777], \"label\": \"Text\"}, {\"bbox_2d\": [822, 801, 969, 809], \"label\": \"Text\"}, {\"bbox_2d\": [983, 994, 991, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [418, 922, 641, 947], \"label\": \"Caption\"}, {\"bbox_2d\": [418, 848, 635, 873], \"label\": \"Caption\"}, {\"bbox_2d\": [418, 772, 544, 797], \"label\": \"Caption\"}, {\"bbox_2d\": [418, 823, 562, 839], \"label\": \"Picture\"}, {\"bbox_2d\": [416, 839, 567, 921], \"label\": \"Picture\"}, {\"bbox_2d\": [417, 922, 642, 947], \"label\": \"Picture\"}, {\"bbox_2d\": [78, 826, 158, 836], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 185, 498, 431], \"label\": \"Picture\"}, {\"bbox_2d\": [628, 94, 943, 217], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 71, 1035, 518], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [556, 21, 885, 39], \"label\": \"Text\"}, {\"bbox_2d\": [186, 491, 551, 519], \"label\": \"Text\"}, {\"bbox_2d\": [417, 584, 725, 651], \"label\": \"Text\"}, {\"bbox_2d\": [77, 584, 382, 799], \"label\": \"Text\"}, {\"bbox_2d\": [77, 843, 384, 947], \"label\": \"Text\"}, {\"bbox_2d\": [796, 706, 978, 809], \"label\": \"Text\"}, {\"bbox_2d\": [77, 827, 158, 836], \"label\": \"Section-header\"}, {\"bbox_2d\": [405, 733, 650, 767], \"label\": \"Picture\"}, {\"bbox_2d\": [412, 812, 564, 845], \"label\": \"Picture\"}, {\"bbox_2d\": [412, 884, 575, 918], \"label\": \"Picture\"}, {\"bbox_2d\": [417, 772, 548, 795], \"label\": \"Caption\"}, {\"bbox_2d\": [417, 848, 636, 871], \"label\": \"Caption\"}, {\"bbox_2d\": [418, 922, 643, 945], \"label\": \"Caption\"}, {\"bbox_2d\": [982, 995, 989, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [0, 22, 549, 452], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 70, 1036, 518], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [354, 83, 916, 113], \"label\": \"Text\"}, {\"bbox_2d\": [354, 130, 536, 143], \"label\": \"Text\"}, {\"bbox_2d\": [414, 170, 926, 215], \"label\": \"Text\"}, {\"bbox_2d\": [414, 240, 936, 333], \"label\": \"Text\"}, {\"bbox_2d\": [414, 358, 926, 449], \"label\": \"Text\"}, {\"bbox_2d\": [414, 459, 936, 567], \"label\": \"Text\"}, {\"bbox_2d\": [414, 577, 937, 638], \"label\": \"Text\"}, {\"bbox_2d\": [414, 664, 878, 693], \"label\": \"Text\"}, {\"bbox_2d\": [414, 719, 907, 779], \"label\": \"Text\"}, {\"bbox_2d\": [354, 703, 517, 715], \"label\": \"Text\"}, {\"bbox_2d\": [354, 812, 869, 849], \"label\": \"Text\"}, {\"bbox_2d\": [414, 875, 873, 905], \"label\": \"Text\"}, {\"bbox_2d\": [414, 929, 614, 943], \"label\": \"Text\"}, {\"bbox_2d\": [292, 789, 490, 803], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 646, 443, 661], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 577, 397, 589], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 341, 441, 355], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 223, 461, 238], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 154, 413, 168], \"label\": \"Section-header\"}, {\"bbox_2d\": [600, 979, 935, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [354, 859, 413, 874], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 914, 424, 929], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [354, 85, 911, 114], \"label\": \"List-item\"}, {\"bbox_2d\": [354, 132, 536, 145], \"label\": \"Text\"}, {\"bbox_2d\": [354, 155, 412, 168], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 226, 461, 239], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 343, 440, 356], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 704, 516, 717], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 649, 443, 662], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 790, 489, 803], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 860, 412, 873], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 915, 424, 928], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 931, 612, 944], \"label\": \"Text\"}, {\"bbox_2d\": [415, 876, 867, 905], \"label\": \"Text\"}, {\"bbox_2d\": [354, 837, 536, 850], \"label\": \"Text\"}, {\"bbox_2d\": [354, 813, 864, 826], \"label\": \"Text\"}, {\"bbox_2d\": [415, 719, 902, 779], \"label\": \"Text\"}, {\"bbox_2d\": [415, 664, 876, 693], \"label\": \"Text\"}, {\"bbox_2d\": [415, 359, 921, 451], \"label\": \"Text\"}, {\"bbox_2d\": [415, 242, 932, 333], \"label\": \"Text\"}, {\"bbox_2d\": [415, 171, 921, 216], \"label\": \"Text\"}, {\"bbox_2d\": [354, 461, 929, 568], \"label\": \"List-item\"}, {\"bbox_2d\": [354, 578, 932, 638], \"label\": \"List-item\"}, {\"bbox_2d\": [594, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 436, 786, 449], \"label\": \"Caption\"}, {\"bbox_2d\": [141, 521, 892, 829], \"label\": \"Picture\"}, {\"bbox_2d\": [274, 879, 786, 890], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [123, 112, 866, 395], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [144, 504, 906, 834], \"label\": \"Picture\"}, {\"bbox_2d\": [273, 435, 758, 449], \"label\": \"Caption\"}, {\"bbox_2d\": [273, 876, 758, 890], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [118, 106, 899, 394], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [938, 41, 979, 67], \"label\": \"Page-header\"}, {\"bbox_2d\": [999, 206, 1018, 334], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 192, 352, 254], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 258, 335, 318], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 323, 346, 373], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 376, 345, 400], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 412, 349, 594], \"label\": \"Text\"}, {\"bbox_2d\": [78, 637, 354, 672], \"label\": \"Text\"}, {\"bbox_2d\": [78, 679, 341, 728], \"label\": \"Text\"}, {\"bbox_2d\": [78, 828, 353, 888], \"label\": \"Text\"}, {\"bbox_2d\": [381, 839, 657, 888], \"label\": \"Text\"}, {\"bbox_2d\": [381, 895, 654, 943], \"label\": \"Text\"}, {\"bbox_2d\": [381, 744, 649, 767], \"label\": \"Text\"}, {\"bbox_2d\": [78, 613, 290, 629], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 450, 657, 594], \"label\": \"Section-header\"}, {\"bbox_2d\": [381, 560, 539, 575], \"label\": \"Section-header\"}, {\"bbox_2d\": [683, 609, 869, 642], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 737, 291, 747], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 751, 234, 762], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 766, 278, 777], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 781, 319, 805], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 809, 296, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 824, 353, 835], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 744, 649, 767], \"label\": \"Text\"}, {\"bbox_2d\": [381, 716, 655, 726], \"label\": \"Text\"}, {\"bbox_2d\": [381, 691, 655, 701], \"label\": \"Text\"}, {\"bbox_2d\": [381, 514, 649, 539], \"label\": \"Text\"}, {\"bbox_2d\": [381, 381, 639, 429], \"label\": \"Text\"}, {\"bbox_2d\": [381, 313, 642, 373], \"label\": \"Text\"}, {\"bbox_2d\": [381, 231, 653, 306], \"label\": \"Text\"}, {\"bbox_2d\": [381, 192, 625, 225], \"label\": \"Section-header\"}, {\"bbox_2d\": [683, 194, 958, 217], \"label\": \"Text\"}, {\"bbox_2d\": [683, 221, 958, 269], \"label\": \"Text\"}, {\"bbox_2d\": [683, 273, 958, 362], \"label\": \"Text\"}, {\"bbox_2d\": [683, 365, 947, 388], \"label\": \"Text\"}, {\"bbox_2d\": [683, 392, 932, 404], \"label\": \"Text\"}, {\"bbox_2d\": [683, 408, 958, 442], \"label\": \"Text\"}, {\"bbox_2d\": [683, 454, 948, 514], \"label\": \"Text\"}, {\"bbox_2d\": [683, 523, 958, 546], \"label\": \"Text\"}, {\"bbox_2d\": [683, 555, 958, 589], \"label\": \"Text\"}, {\"bbox_2d\": [683, 650, 958, 733], \"label\": \"Text\"}, {\"bbox_2d\": [683, 767, 948, 841], \"label\": \"Text\"}, {\"bbox_2d\": [683, 846, 958, 921], \"label\": \"Text\"}, {\"bbox_2d\": [381, 449, 657, 464], \"label\": \"Section-header\"}, {\"bbox_2d\": [683, 880, 958, 921], \"label\": \"Text\"}, {\"bbox_2d\": [381, 721, 524, 736], \"label\": \"Section-header\"}, {\"bbox_2d\": [381, 635, 504, 645], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 650, 578, 660], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 665, 544, 675], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 771, 624, 794], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 799, 544, 809], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 814, 504, 824], \"label\": \"List-item\"}, {\"bbox_2d\": [941, 994, 993, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [935, 34, 974, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [795, 68, 976, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [1002, 206, 1018, 334], \"label\": \"Page-header\"}, {\"bbox_2d\": [381, 189, 620, 225], \"label\": \"Section-header\"}, {\"bbox_2d\": [381, 446, 653, 465], \"label\": \"Section-header\"}, {\"bbox_2d\": [381, 556, 536, 575], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 610, 287, 629], \"label\": \"Section-header\"}, {\"bbox_2d\": [684, 607, 867, 643], \"label\": \"Section-header\"}, {\"bbox_2d\": [381, 717, 520, 736], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 410, 351, 593], \"label\": \"Text\"}, {\"bbox_2d\": [78, 636, 353, 672], \"label\": \"Text\"}, {\"bbox_2d\": [78, 680, 339, 728], \"label\": \"Text\"}, {\"bbox_2d\": [78, 827, 353, 888], \"label\": \"Text\"}, {\"bbox_2d\": [381, 838, 657, 886], \"label\": \"Text\"}, {\"bbox_2d\": [381, 894, 654, 942], \"label\": \"Text\"}, {\"bbox_2d\": [684, 848, 960, 921], \"label\": \"Text\"}, {\"bbox_2d\": [684, 767, 955, 840], \"label\": \"Text\"}, {\"bbox_2d\": [684, 650, 959, 759], \"label\": \"Text\"}, {\"bbox_2d\": [381, 582, 637, 630], \"label\": \"Text\"}, {\"bbox_2d\": [381, 689, 651, 700], \"label\": \"Text\"}, {\"bbox_2d\": [381, 472, 640, 508], \"label\": \"Text\"}, {\"bbox_2d\": [381, 516, 644, 539], \"label\": \"Text\"}, {\"bbox_2d\": [381, 381, 637, 429], \"label\": \"Text\"}, {\"bbox_2d\": [381, 312, 640, 373], \"label\": \"Text\"}, {\"bbox_2d\": [381, 232, 651, 304], \"label\": \"Text\"}, {\"bbox_2d\": [684, 192, 956, 216], \"label\": \"Text\"}, {\"bbox_2d\": [684, 454, 960, 514], \"label\": \"Text\"}, {\"bbox_2d\": [684, 522, 959, 546], \"label\": \"Text\"}, {\"bbox_2d\": [684, 554, 950, 590], \"label\": \"Text\"}, {\"bbox_2d\": [381, 771, 619, 795], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 799, 538, 810], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 815, 504, 826], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 736, 288, 747], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 752, 230, 763], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 768, 275, 779], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 784, 317, 807], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 812, 291, 823], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 635, 502, 646], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 650, 575, 662], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 666, 537, 678], \"label\": \"List-item\"}, {\"bbox_2d\": [381, 743, 647, 767], \"label\": \"Text\"}, {\"bbox_2d\": [78, 192, 352, 253], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 257, 338, 318], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 322, 323, 370], \"label\": \"List-item\"}, {\"bbox_2d\": [78, 375, 344, 398], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 220, 953, 268], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 273, 958, 333], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 338, 941, 361], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 366, 945, 402], \"label\": \"List-item\"}, {\"bbox_2d\": [684, 406, 959, 442], \"label\": \"List-item\"}, {\"bbox_2d\": [943, 991, 991, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [81, 109, 214, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 127, 934, 185], \"label\": \"Text\"}, {\"bbox_2d\": [81, 529, 503, 745], \"label\": \"Text\"}, {\"bbox_2d\": [81, 757, 503, 894], \"label\": \"Text\"}, {\"bbox_2d\": [533, 843, 936, 935], \"label\": \"Text\"}, {\"bbox_2d\": [533, 615, 946, 832], \"label\": \"Text\"}, {\"bbox_2d\": [533, 529, 935, 605], \"label\": \"Text\"}, {\"bbox_2d\": [81, 1004, 246, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [79, 197, 956, 499], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [81, 111, 214, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 127, 932, 185], \"label\": \"Text\"}, {\"bbox_2d\": [81, 530, 502, 744], \"label\": \"Text\"}, {\"bbox_2d\": [81, 757, 502, 893], \"label\": \"Text\"}, {\"bbox_2d\": [533, 530, 937, 603], \"label\": \"Text\"}, {\"bbox_2d\": [533, 616, 949, 830], \"label\": \"Text\"}, {\"bbox_2d\": [533, 844, 943, 932], \"label\": \"Text\"}, {\"bbox_2d\": [81, 1005, 246, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [75, 195, 957, 504], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [491, 71, 543, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 95, 993, 117], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 123, 194, 134], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 257, 235, 268], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 366, 213, 377], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 138, 1009, 160], \"label\": \"Text\"}, {\"bbox_2d\": [25, 169, 1009, 204], \"label\": \"Text\"}, {\"bbox_2d\": [25, 212, 1009, 246], \"label\": \"Text\"}, {\"bbox_2d\": [25, 274, 1009, 354], \"label\": \"Text\"}, {\"bbox_2d\": [25, 382, 1009, 415], \"label\": \"Text\"}, {\"bbox_2d\": [25, 424, 1009, 481], \"label\": \"Text\"}, {\"bbox_2d\": [508, 512, 525, 523], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [490, 71, 543, 82], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 95, 993, 117], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 122, 195, 133], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 138, 1009, 160], \"label\": \"Text\"}, {\"bbox_2d\": [25, 169, 1009, 202], \"label\": \"Text\"}, {\"bbox_2d\": [25, 211, 1009, 245], \"label\": \"Text\"}, {\"bbox_2d\": [25, 274, 1009, 353], \"label\": \"Text\"}, {\"bbox_2d\": [25, 382, 1009, 415], \"label\": \"Text\"}, {\"bbox_2d\": [25, 424, 1009, 480], \"label\": \"Text\"}, {\"bbox_2d\": [25, 366, 213, 377], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 258, 234, 269], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 512, 525, 523], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [145, 209, 382, 220], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 522, 343, 534], \"label\": \"Section-header\"}, {\"bbox_2d\": [434, 574, 596, 585], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 536, 215, 545], \"label\": \"Text\"}, {\"bbox_2d\": [813, 1002, 976, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [148, 234, 877, 472], \"label\": \"Picture\"}, {\"bbox_2d\": [146, 555, 318, 947], \"label\": \"Picture\"}, {\"bbox_2d\": [426, 617, 912, 939], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [1014, 363, 1024, 432], \"label\": \"Page-header\"}, {\"bbox_2d\": [813, 1002, 976, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [228, 293, 288, 304], \"label\": \"Section-header\"}, {\"bbox_2d\": [413, 293, 461, 304], \"label\": \"Section-header\"}, {\"bbox_2d\": [748, 293, 845, 304], \"label\": \"Section-header\"}, {\"bbox_2d\": [558, 293, 676, 304], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 209, 381, 219], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 523, 343, 533], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 538, 213, 544], \"label\": \"Text\"}, {\"bbox_2d\": [208, 324, 307, 393], \"label\": \"Text\"}, {\"bbox_2d\": [383, 324, 492, 393], \"label\": \"Text\"}, {\"bbox_2d\": [564, 324, 670, 452], \"label\": \"Text\"}, {\"bbox_2d\": [556, 461, 678, 468], \"label\": \"Text\"}, {\"bbox_2d\": [139, 560, 319, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [424, 644, 540, 915], \"label\": \"Picture\"}, {\"bbox_2d\": [677, 646, 785, 911], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 649, 665, 909], \"label\": \"Picture\"}, {\"bbox_2d\": [799, 647, 909, 944], \"label\": \"Picture\"}, {\"bbox_2d\": [804, 621, 844, 630], \"label\": \"Caption\"}, {\"bbox_2d\": [681, 621, 729, 630], \"label\": \"Caption\"}, {\"bbox_2d\": [558, 621, 584, 630], \"label\": \"Caption\"}, {\"bbox_2d\": [434, 621, 473, 630], \"label\": \"Caption\"}, {\"bbox_2d\": [434, 576, 596, 585], \"label\": \"Section-header\"}, {\"bbox_2d\": [434, 590, 499, 597], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 737, 109], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 124, 814, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 153, 911, 195], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 209, 677, 224], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 238, 910, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 266, 595, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 294, 538, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 322, 801, 336], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 96, 737, 110], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 124, 815, 139], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 153, 910, 195], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 209, 676, 224], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 238, 910, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 266, 596, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 294, 538, 309], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 323, 801, 337], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [137, 37, 320, 46], \"label\": \"Page-header\"}, {\"bbox_2d\": [654, 37, 898, 46], \"label\": \"Page-header\"}, {\"bbox_2d\": [137, 67, 898, 98], \"label\": \"Text\"}, {\"bbox_2d\": [137, 152, 898, 216], \"label\": \"Text\"}, {\"bbox_2d\": [137, 272, 898, 320], \"label\": \"Text\"}, {\"bbox_2d\": [137, 340, 898, 388], \"label\": \"Text\"}, {\"bbox_2d\": [137, 409, 898, 438], \"label\": \"Text\"}, {\"bbox_2d\": [137, 460, 898, 508], \"label\": \"Text\"}, {\"bbox_2d\": [137, 563, 898, 593], \"label\": \"Text\"}, {\"bbox_2d\": [137, 614, 378, 627], \"label\": \"Text\"}, {\"bbox_2d\": [137, 936, 272, 950], \"label\": \"Section-header\"}, {\"bbox_2d\": [169, 885, 898, 915], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 769, 898, 883], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 750, 735, 763], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 698, 898, 746], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 647, 898, 695], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 528, 681, 542], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 239, 420, 251], \"label\": \"Section-header\"}, {\"bbox_2d\": [169, 118, 353, 131], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 974, 549, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [138, 35, 322, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [654, 35, 903, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 65, 902, 98], \"label\": \"Text\"}, {\"bbox_2d\": [169, 116, 358, 132], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 236, 426, 252], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 934, 278, 950], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 150, 902, 218], \"label\": \"Text\"}, {\"bbox_2d\": [138, 270, 902, 320], \"label\": \"Text\"}, {\"bbox_2d\": [138, 338, 902, 388], \"label\": \"Text\"}, {\"bbox_2d\": [138, 406, 902, 439], \"label\": \"Text\"}, {\"bbox_2d\": [138, 457, 902, 507], \"label\": \"Text\"}, {\"bbox_2d\": [138, 526, 687, 542], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 559, 902, 593], \"label\": \"Text\"}, {\"bbox_2d\": [138, 611, 384, 627], \"label\": \"Text\"}, {\"bbox_2d\": [169, 645, 902, 695], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 696, 903, 746], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 747, 852, 763], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 764, 902, 882], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 883, 902, 916], \"label\": \"List-item\"}, {\"bbox_2d\": [534, 972, 554, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [116, 78, 960, 112], \"label\": \"Text\"}, {\"bbox_2d\": [116, 117, 960, 172], \"label\": \"Text\"}, {\"bbox_2d\": [160, 178, 555, 193], \"label\": \"Text\"}, {\"bbox_2d\": [116, 198, 960, 226], \"label\": \"Text\"}, {\"bbox_2d\": [116, 232, 555, 260], \"label\": \"Text\"}, {\"bbox_2d\": [116, 265, 555, 293], \"label\": \"Text\"}, {\"bbox_2d\": [116, 298, 555, 326], \"label\": \"Text\"}, {\"bbox_2d\": [116, 331, 960, 369], \"label\": \"Text\"}, {\"bbox_2d\": [116, 374, 960, 429], \"label\": \"Text\"}, {\"bbox_2d\": [160, 434, 960, 489], \"label\": \"Text\"}, {\"bbox_2d\": [116, 494, 960, 547], \"label\": \"Text\"}, {\"bbox_2d\": [116, 552, 930, 606], \"label\": \"Text\"}, {\"bbox_2d\": [116, 612, 960, 665], \"label\": \"Text\"}, {\"bbox_2d\": [116, 670, 960, 702], \"label\": \"Text\"}, {\"bbox_2d\": [116, 708, 960, 823], \"label\": \"Text\"}, {\"bbox_2d\": [160, 828, 960, 842], \"label\": \"Text\"}, {\"bbox_2d\": [176, 861, 960, 933], \"label\": \"Text\"}, {\"bbox_2d\": [527, 38, 546, 49], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [116, 862, 959, 930], \"label\": \"Footnote\"}, {\"bbox_2d\": [528, 39, 544, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 74, 959, 112], \"label\": \"Text\"}, {\"bbox_2d\": [161, 175, 552, 191], \"label\": \"Text\"}, {\"bbox_2d\": [116, 195, 959, 231], \"label\": \"Text\"}, {\"bbox_2d\": [115, 116, 960, 172], \"label\": \"Text\"}, {\"bbox_2d\": [116, 234, 959, 270], \"label\": \"Text\"}, {\"bbox_2d\": [161, 274, 556, 290], \"label\": \"Text\"}, {\"bbox_2d\": [162, 293, 643, 309], \"label\": \"Text\"}, {\"bbox_2d\": [162, 313, 820, 329], \"label\": \"Text\"}, {\"bbox_2d\": [116, 332, 959, 367], \"label\": \"Text\"}, {\"bbox_2d\": [162, 371, 955, 388], \"label\": \"Text\"}, {\"bbox_2d\": [116, 392, 960, 427], \"label\": \"Text\"}, {\"bbox_2d\": [162, 430, 915, 448], \"label\": \"Text\"}, {\"bbox_2d\": [116, 450, 960, 486], \"label\": \"Text\"}, {\"bbox_2d\": [116, 489, 960, 545], \"label\": \"Text\"}, {\"bbox_2d\": [116, 549, 959, 605], \"label\": \"Text\"}, {\"bbox_2d\": [115, 607, 960, 662], \"label\": \"Text\"}, {\"bbox_2d\": [116, 667, 959, 703], \"label\": \"Text\"}, {\"bbox_2d\": [115, 706, 961, 822], \"label\": \"Text\"}, {\"bbox_2d\": [161, 824, 959, 842], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 357], \"label\": \"Text\"}, {\"bbox_2d\": [121, 368, 911, 439], \"label\": \"Text\"}, {\"bbox_2d\": [121, 495, 858, 510], \"label\": \"Text\"}, {\"bbox_2d\": [121, 568, 824, 582], \"label\": \"Text\"}, {\"bbox_2d\": [121, 594, 911, 773], \"label\": \"Text\"}, {\"bbox_2d\": [121, 784, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [401, 521, 911, 557], \"label\": \"Formula\"}, {\"bbox_2d\": [364, 451, 911, 485], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 911, 357], \"label\": \"Text\"}, {\"bbox_2d\": [121, 369, 911, 439], \"label\": \"Text\"}, {\"bbox_2d\": [121, 495, 858, 510], \"label\": \"Text\"}, {\"bbox_2d\": [121, 566, 824, 582], \"label\": \"Text\"}, {\"bbox_2d\": [121, 594, 911, 784], \"label\": \"Text\"}, {\"bbox_2d\": [121, 785, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [364, 448, 911, 491], \"label\": \"Formula\"}, {\"bbox_2d\": [402, 520, 911, 559], \"label\": \"Formula\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [354, 81, 882, 152], \"label\": \"Section-header\"}, {\"bbox_2d\": [384, 161, 793, 191], \"label\": \"Text\"}, {\"bbox_2d\": [384, 263, 953, 308], \"label\": \"Text\"}, {\"bbox_2d\": [384, 416, 918, 446], \"label\": \"Text\"}, {\"bbox_2d\": [384, 554, 918, 582], \"label\": \"Text\"}, {\"bbox_2d\": [384, 725, 953, 788], \"label\": \"Text\"}, {\"bbox_2d\": [245, 888, 953, 984], \"label\": \"Text\"}, {\"bbox_2d\": [258, 823, 330, 862], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 616, 933, 714], \"label\": \"Section-header\"}, {\"bbox_2d\": [384, 479, 936, 543], \"label\": \"Section-header\"}, {\"bbox_2d\": [384, 342, 911, 406], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 223, 825, 251], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [497, 161, 909, 189], \"label\": \"Text\"}, {\"bbox_2d\": [497, 263, 957, 307], \"label\": \"Text\"}, {\"bbox_2d\": [497, 416, 924, 444], \"label\": \"Text\"}, {\"bbox_2d\": [467, 476, 951, 543], \"label\": \"Section-header\"}, {\"bbox_2d\": [467, 339, 927, 406], \"label\": \"Section-header\"}, {\"bbox_2d\": [467, 221, 938, 253], \"label\": \"Section-header\"}, {\"bbox_2d\": [467, 84, 897, 151], \"label\": \"Section-header\"}, {\"bbox_2d\": [467, 613, 934, 716], \"label\": \"Section-header\"}, {\"bbox_2d\": [243, 824, 330, 862], \"label\": \"Section-header\"}, {\"bbox_2d\": [497, 553, 922, 581], \"label\": \"Text\"}, {\"bbox_2d\": [497, 725, 952, 786], \"label\": \"Text\"}, {\"bbox_2d\": [243, 888, 952, 983], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 48, 296, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [182, 171, 653, 309], \"label\": \"Text\"}, {\"bbox_2d\": [182, 357, 707, 479], \"label\": \"Text\"}, {\"bbox_2d\": [182, 526, 617, 572], \"label\": \"Text\"}, {\"bbox_2d\": [266, 589, 872, 636], \"label\": \"Text\"}, {\"bbox_2d\": [121, 972, 446, 988], \"label\": \"Page-footer\"}, {\"bbox_2d\": [465, 144, 577, 176], \"label\": \"Section-header\"}, {\"bbox_2d\": [463, 327, 577, 359], \"label\": \"Section-header\"}, {\"bbox_2d\": [465, 496, 577, 528], \"label\": \"Section-header\"}, {\"bbox_2d\": [794, 653, 924, 935], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 175, 651, 189], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 190, 475, 204], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 205, 541, 219], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 220, 426, 234], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 235, 695, 249], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 250, 662, 264], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 265, 531, 279], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 280, 628, 294], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 295, 660, 309], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 358, 412, 373], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 373, 720, 388], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 388, 488, 403], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 403, 695, 418], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 418, 421, 433], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 433, 521, 448], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 448, 354, 463], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 463, 390, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 527, 368, 541], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 542, 627, 556], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 557, 370, 571], \"label\": \"List-item\"}, {\"bbox_2d\": [265, 590, 889, 634], \"label\": \"Text\"}, {\"bbox_2d\": [793, 653, 919, 934], \"label\": \"Text\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}, {\"bbox_2d\": [464, 145, 575, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [447, 159, 593, 174], \"label\": \"Section-header\"}, {\"bbox_2d\": [462, 328, 578, 343], \"label\": \"Section-header\"}, {\"bbox_2d\": [488, 343, 552, 358], \"label\": \"Section-header\"}, {\"bbox_2d\": [467, 497, 573, 511], \"label\": \"Section-header\"}, {\"bbox_2d\": [464, 512, 576, 526], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 84, 873, 97], \"label\": \"Text\"}, {\"bbox_2d\": [293, 416, 931, 460], \"label\": \"Text\"}, {\"bbox_2d\": [293, 479, 918, 539], \"label\": \"Text\"}, {\"bbox_2d\": [293, 558, 931, 601], \"label\": \"Text\"}, {\"bbox_2d\": [293, 620, 931, 664], \"label\": \"Text\"}, {\"bbox_2d\": [293, 683, 884, 711], \"label\": \"Text\"}, {\"bbox_2d\": [618, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 109, 837, 380], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 873, 98], \"label\": \"Text\"}, {\"bbox_2d\": [486, 109, 689, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [317, 133, 493, 168], \"label\": \"Text\"}, {\"bbox_2d\": [506, 133, 547, 156], \"label\": \"Text\"}, {\"bbox_2d\": [594, 133, 817, 168], \"label\": \"Text\"}, {\"bbox_2d\": [317, 180, 831, 215], \"label\": \"Text\"}, {\"bbox_2d\": [317, 332, 797, 379], \"label\": \"Text\"}, {\"bbox_2d\": [803, 321, 844, 332], \"label\": \"Text\"}, {\"bbox_2d\": [293, 416, 931, 461], \"label\": \"Text\"}, {\"bbox_2d\": [293, 557, 931, 602], \"label\": \"Text\"}, {\"bbox_2d\": [293, 683, 884, 712], \"label\": \"Text\"}, {\"bbox_2d\": [293, 479, 917, 539], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 620, 931, 665], \"label\": \"List-item\"}, {\"bbox_2d\": [619, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [316, 227, 783, 262], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [699, 92, 721, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 327, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [132, 111, 502, 134], \"label\": \"Text\"}, {\"bbox_2d\": [132, 141, 502, 234], \"label\": \"Text\"}, {\"bbox_2d\": [132, 242, 502, 278], \"label\": \"Text\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [160, 312, 275, 322], \"label\": \"Text\"}, {\"bbox_2d\": [159, 342, 276, 431], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 441, 274, 526], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 537, 274, 627], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 637, 251, 744], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 754, 251, 845], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 865, 250, 966], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 132, 661, 234], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 248, 661, 344], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 360, 661, 459], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 462, 661, 553], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 565, 661, 654], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 666, 660, 764], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 768, 660, 862], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 871, 660, 968], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 92, 326, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [698, 92, 721, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [131, 110, 501, 134], \"label\": \"Text\"}, {\"bbox_2d\": [130, 142, 500, 234], \"label\": \"Text\"}, {\"bbox_2d\": [131, 242, 498, 279], \"label\": \"Text\"}, {\"bbox_2d\": [162, 314, 274, 322], \"label\": \"Text\"}, {\"bbox_2d\": [160, 337, 498, 428], \"label\": \"Picture\"}, {\"bbox_2d\": [161, 436, 498, 527], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 535, 498, 626], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 635, 498, 743], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 750, 497, 846], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 854, 499, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 872, 895, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 770, 894, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 667, 894, 764], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 565, 894, 661], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 462, 894, 558], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 359, 894, 455], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 241, 894, 352], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 126, 894, 239], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [419, 468, 631, 483], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [419, 466, 631, 482], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 72, 951, 98], \"label\": \"Text\"}, {\"bbox_2d\": [91, 72, 506, 101], \"label\": \"Text\"}, {\"bbox_2d\": [680, 128, 951, 158], \"label\": \"Formula\"}, {\"bbox_2d\": [134, 111, 506, 190], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 183, 951, 212], \"label\": \"Text\"}, {\"bbox_2d\": [91, 198, 506, 211], \"label\": \"Text\"}, {\"bbox_2d\": [536, 269, 951, 308], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 269, 408, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 283, 506, 322], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 324, 506, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 366, 951, 405], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 365, 506, 405], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 407, 506, 447], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 449, 506, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 491, 506, 530], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 532, 506, 568], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 570, 506, 582], \"label\": \"List-item\"}, {\"bbox_2d\": [570, 269, 736, 281], \"label\": \"Text\"}, {\"bbox_2d\": [536, 407, 951, 463], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 465, 951, 568], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [91, 71, 506, 99], \"label\": \"Text\"}, {\"bbox_2d\": [536, 69, 951, 99], \"label\": \"Text\"}, {\"bbox_2d\": [91, 197, 506, 210], \"label\": \"Text\"}, {\"bbox_2d\": [536, 182, 951, 210], \"label\": \"Text\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [137, 112, 506, 184], \"label\": \"Formula\"}, {\"bbox_2d\": [657, 126, 951, 158], \"label\": \"Formula\"}, {\"bbox_2d\": [99, 282, 506, 321], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 268, 381, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 323, 505, 362], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 364, 506, 403], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 405, 506, 444], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 446, 506, 485], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 487, 506, 540], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 542, 506, 567], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 569, 506, 581], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 282, 951, 335], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 337, 951, 403], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 405, 951, 458], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 460, 951, 567], \"label\": \"List-item\"}, {\"bbox_2d\": [567, 268, 731, 280], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 124, 645, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 165, 907, 178], \"label\": \"Text\"}, {\"bbox_2d\": [293, 197, 914, 272], \"label\": \"Text\"}, {\"bbox_2d\": [293, 291, 914, 320], \"label\": \"Text\"}, {\"bbox_2d\": [293, 339, 919, 397], \"label\": \"Text\"}, {\"bbox_2d\": [293, 416, 923, 445], \"label\": \"Text\"}, {\"bbox_2d\": [293, 452, 482, 462], \"label\": \"Text\"}, {\"bbox_2d\": [293, 482, 628, 494], \"label\": \"Text\"}, {\"bbox_2d\": [293, 599, 920, 628], \"label\": \"Text\"}, {\"bbox_2d\": [293, 635, 611, 645], \"label\": \"Text\"}, {\"bbox_2d\": [293, 665, 628, 677], \"label\": \"Text\"}, {\"bbox_2d\": [293, 511, 919, 565], \"label\": \"Table\"}, {\"bbox_2d\": [293, 691, 932, 796], \"label\": \"Table\"}, {\"bbox_2d\": [293, 828, 906, 855], \"label\": \"Text\"}, {\"bbox_2d\": [293, 875, 923, 904], \"label\": \"Text\"}, {\"bbox_2d\": [906, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 645, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 166, 906, 179], \"label\": \"Text\"}, {\"bbox_2d\": [293, 197, 914, 273], \"label\": \"Text\"}, {\"bbox_2d\": [293, 291, 913, 320], \"label\": \"Text\"}, {\"bbox_2d\": [293, 338, 918, 398], \"label\": \"Text\"}, {\"bbox_2d\": [293, 416, 923, 463], \"label\": \"Text\"}, {\"bbox_2d\": [293, 482, 628, 495], \"label\": \"Text\"}, {\"bbox_2d\": [293, 599, 921, 628], \"label\": \"Text\"}, {\"bbox_2d\": [293, 632, 613, 646], \"label\": \"Text\"}, {\"bbox_2d\": [293, 664, 628, 678], \"label\": \"Text\"}, {\"bbox_2d\": [293, 828, 905, 856], \"label\": \"Text\"}, {\"bbox_2d\": [293, 875, 923, 903], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 507, 921, 567], \"label\": \"Table\"}, {\"bbox_2d\": [293, 691, 932, 795], \"label\": \"Table\"}, {\"bbox_2d\": [904, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [225, 327, 744, 553], \"label\": \"Picture\"}, {\"bbox_2d\": [225, 571, 808, 716], \"label\": \"Caption\"}, {\"bbox_2d\": [509, 908, 526, 922], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [224, 327, 809, 552], \"label\": \"Picture\"}, {\"bbox_2d\": [226, 573, 808, 719], \"label\": \"Caption\"}, {\"bbox_2d\": [508, 910, 525, 921], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [549, 570, 977, 647], \"label\": \"Text\"}, {\"bbox_2d\": [546, 722, 990, 881], \"label\": \"Text\"}, {\"bbox_2d\": [23, 46, 284, 78], \"label\": \"Section-header\"}, {\"bbox_2d\": [549, 526, 827, 553], \"label\": \"Section-header\"}, {\"bbox_2d\": [549, 669, 766, 696], \"label\": \"Section-header\"}, {\"bbox_2d\": [549, 49, 775, 78], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 102, 111, 117], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 127, 124, 155], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 294, 222, 322], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 432, 84, 443], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 598, 147, 610], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 610, 242, 749], \"label\": \"Text\"}, {\"bbox_2d\": [25, 158, 232, 200], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 203, 212, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 325, 230, 349], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 352, 232, 413], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 448, 223, 490], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 494, 237, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 540, 236, 582], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 608, 180, 625], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 629, 219, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 676, 212, 688], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 692, 241, 749], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 767, 200, 800], \"label\": \"Section-header\"}, {\"bbox_2d\": [255, 105, 347, 115], \"label\": \"Section-header\"}, {\"bbox_2d\": [255, 127, 317, 139], \"label\": \"Section-header\"}, {\"bbox_2d\": [255, 231, 392, 243], \"label\": \"Section-header\"}, {\"bbox_2d\": [255, 143, 476, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 171, 453, 213], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 250, 466, 292], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 295, 469, 367], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 371, 475, 413], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 448, 475, 520], \"label\": \"List-item\"}, {\"bbox_2d\": [268, 524, 468, 566], \"label\": \"List-item\"}, {\"bbox_2d\": [268, 569, 458, 627], \"label\": \"List-item\"}, {\"bbox_2d\": [268, 629, 458, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [268, 675, 456, 761], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 431, 426, 443], \"label\": \"Section-header\"}, {\"bbox_2d\": [255, 456, 475, 520], \"label\": \"Section-header\"}, {\"bbox_2d\": [255, 782, 381, 794], \"label\": \"Section-header\"}, {\"bbox_2d\": [255, 798, 460, 824], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 827, 457, 870], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 874, 448, 886], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 890, 445, 946], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 813, 216, 839], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 843, 223, 869], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 873, 223, 908], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 913, 200, 925], \"label\": \"List-item\"}, {\"bbox_2d\": [29, 960, 50, 978], \"label\": \"Page-footer\"}, {\"bbox_2d\": [965, 960, 996, 978], \"label\": \"Page-footer\"}, {\"bbox_2d\": [303, 950, 469, 988], \"label\": \"Picture\"}, {\"bbox_2d\": [547, 950, 715, 988], \"label\": \"Picture\"}, {\"bbox_2d\": [724, 483, 817, 496], \"label\": \"Caption\"}, {\"bbox_2d\": [528, 87, 911, 477], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [22, 40, 284, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [254, 100, 346, 117], \"label\": \"Section-header\"}, {\"bbox_2d\": [253, 123, 316, 139], \"label\": \"Section-header\"}, {\"bbox_2d\": [17, 100, 110, 117], \"label\": \"Section-header\"}, {\"bbox_2d\": [20, 123, 160, 154], \"label\": \"Section-header\"}, {\"bbox_2d\": [20, 290, 225, 322], \"label\": \"Section-header\"}, {\"bbox_2d\": [253, 229, 393, 246], \"label\": \"Section-header\"}, {\"bbox_2d\": [253, 427, 426, 444], \"label\": \"Section-header\"}, {\"bbox_2d\": [20, 427, 83, 444], \"label\": \"Section-header\"}, {\"bbox_2d\": [20, 595, 147, 611], \"label\": \"Section-header\"}, {\"bbox_2d\": [27, 767, 199, 798], \"label\": \"Section-header\"}, {\"bbox_2d\": [253, 778, 381, 794], \"label\": \"Section-header\"}, {\"bbox_2d\": [549, 516, 827, 553], \"label\": \"Section-header\"}, {\"bbox_2d\": [551, 661, 767, 698], \"label\": \"Section-header\"}, {\"bbox_2d\": [20, 153, 233, 200], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 199, 228, 276], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 321, 229, 352], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 351, 231, 413], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 443, 214, 459], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 458, 220, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 488, 238, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 534, 241, 581], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 610, 180, 626], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 625, 223, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 671, 213, 687], \"label\": \"List-item\"}, {\"bbox_2d\": [20, 686, 243, 748], \"label\": \"List-item\"}, {\"bbox_2d\": [27, 812, 217, 867], \"label\": \"List-item\"}, {\"bbox_2d\": [27, 867, 227, 909], \"label\": \"List-item\"}, {\"bbox_2d\": [27, 911, 200, 926], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 138, 476, 169], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 168, 453, 215], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 244, 467, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 290, 470, 367], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 366, 476, 413], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 443, 476, 520], \"label\": \"List-item\"}, {\"bbox_2d\": [268, 519, 467, 566], \"label\": \"List-item\"}, {\"bbox_2d\": [268, 564, 458, 626], \"label\": \"List-item\"}, {\"bbox_2d\": [272, 625, 457, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [272, 671, 458, 764], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 793, 459, 825], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 823, 456, 870], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 869, 446, 885], \"label\": \"List-item\"}, {\"bbox_2d\": [253, 884, 453, 946], \"label\": \"List-item\"}, {\"bbox_2d\": [547, 566, 981, 647], \"label\": \"Text\"}, {\"bbox_2d\": [547, 718, 997, 881], \"label\": \"Text\"}, {\"bbox_2d\": [582, 83, 929, 477], \"label\": \"Picture\"}, {\"bbox_2d\": [725, 480, 817, 496], \"label\": \"Caption\"}, {\"bbox_2d\": [298, 947, 473, 988], \"label\": \"Picture\"}, {\"bbox_2d\": [540, 945, 720, 995], \"label\": \"Picture\"}, {\"bbox_2d\": [972, 958, 990, 978], \"label\": \"Page-footer\"}, {\"bbox_2d\": [27, 958, 46, 978], \"label\": \"Page-footer\"}, {\"bbox_2d\": [552, 42, 775, 79], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [272, 83, 705, 97], \"label\": \"Text\"}, {\"bbox_2d\": [272, 848, 866, 877], \"label\": \"Text\"}, {\"bbox_2d\": [272, 897, 894, 955], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 379, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [272, 108, 912, 821], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 705, 98], \"label\": \"Text\"}, {\"bbox_2d\": [272, 109, 912, 820], \"label\": \"Table\"}, {\"bbox_2d\": [273, 849, 867, 878], \"label\": \"Text\"}, {\"bbox_2d\": [273, 896, 894, 956], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [169, 89, 862, 437], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 466, 944, 501], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 532, 320, 547], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 572, 944, 745], \"label\": \"Text\"}, {\"bbox_2d\": [91, 782, 325, 797], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 823, 944, 964], \"label\": \"Text\"}, {\"bbox_2d\": [508, 994, 527, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 574, 944, 744], \"label\": \"Text\"}, {\"bbox_2d\": [91, 825, 944, 964], \"label\": \"Text\"}, {\"bbox_2d\": [91, 468, 944, 501], \"label\": \"Caption\"}, {\"bbox_2d\": [508, 993, 527, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 532, 319, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 783, 324, 797], \"label\": \"Section-header\"}, {\"bbox_2d\": [170, 87, 861, 436], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 100, 276, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 117, 394, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [157, 148, 965, 898], \"label\": \"Table\"}, {\"bbox_2d\": [861, 1002, 990, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [680, 18, 997, 29], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [681, 18, 997, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [120, 101, 276, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [134, 118, 350, 128], \"label\": \"Section-header\"}, {\"bbox_2d\": [161, 133, 395, 143], \"label\": \"Text\"}, {\"bbox_2d\": [152, 149, 970, 899], \"label\": \"Table\"}, {\"bbox_2d\": [863, 1001, 992, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [147, 151, 660, 165], \"label\": \"Text\"}, {\"bbox_2d\": [121, 238, 548, 253], \"label\": \"Text\"}, {\"bbox_2d\": [121, 334, 911, 651], \"label\": \"Text\"}, {\"bbox_2d\": [121, 784, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 666, 911, 770], \"label\": \"Formula\"}, {\"bbox_2d\": [217, 722, 911, 799], \"label\": \"Formula\"}, {\"bbox_2d\": [377, 269, 911, 318], \"label\": \"Formula\"}, {\"bbox_2d\": [433, 178, 911, 231], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [147, 150, 661, 165], \"label\": \"Text\"}, {\"bbox_2d\": [121, 237, 548, 253], \"label\": \"Text\"}, {\"bbox_2d\": [121, 333, 911, 650], \"label\": \"Text\"}, {\"bbox_2d\": [121, 782, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [220, 707, 911, 765], \"label\": \"Formula\"}, {\"bbox_2d\": [247, 655, 911, 717], \"label\": \"Formula\"}, {\"bbox_2d\": [373, 259, 911, 323], \"label\": \"Formula\"}, {\"bbox_2d\": [435, 172, 911, 230], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [241, 88, 806, 313], \"label\": \"Picture\"}, {\"bbox_2d\": [371, 331, 666, 345], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 387, 911, 505], \"label\": \"Text\"}, {\"bbox_2d\": [121, 517, 911, 635], \"label\": \"Text\"}, {\"bbox_2d\": [121, 778, 357, 792], \"label\": \"Text\"}, {\"bbox_2d\": [121, 919, 307, 934], \"label\": \"Text\"}, {\"bbox_2d\": [508, 970, 527, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [321, 818, 716, 910], \"label\": \"Formula\"}, {\"bbox_2d\": [406, 668, 630, 745], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [369, 332, 666, 344], \"label\": \"Caption\"}, {\"bbox_2d\": [238, 94, 807, 321], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 388, 914, 504], \"label\": \"Text\"}, {\"bbox_2d\": [121, 518, 914, 634], \"label\": \"Text\"}, {\"bbox_2d\": [121, 777, 299, 789], \"label\": \"Text\"}, {\"bbox_2d\": [121, 920, 308, 932], \"label\": \"Text\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [406, 667, 629, 742], \"label\": \"Formula\"}, {\"bbox_2d\": [321, 812, 714, 897], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 25, 155, 34], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 118, 57, 128], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 128, 795, 137], \"label\": \"Text\"}, {\"bbox_2d\": [25, 151, 974, 161], \"label\": \"Text\"}, {\"bbox_2d\": [25, 372, 1009, 402], \"label\": \"Text\"}, {\"bbox_2d\": [25, 416, 1009, 435], \"label\": \"Text\"}, {\"bbox_2d\": [25, 460, 804, 470], \"label\": \"Text\"}, {\"bbox_2d\": [25, 485, 1009, 505], \"label\": \"Text\"}, {\"bbox_2d\": [25, 716, 1009, 736], \"label\": \"Text\"}, {\"bbox_2d\": [207, 194, 853, 355], \"label\": \"Picture\"}, {\"bbox_2d\": [207, 539, 826, 698], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [25, 24, 155, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 372, 1009, 401], \"label\": \"Text\"}, {\"bbox_2d\": [25, 415, 1009, 435], \"label\": \"Text\"}, {\"bbox_2d\": [25, 462, 808, 472], \"label\": \"Text\"}, {\"bbox_2d\": [25, 486, 1009, 506], \"label\": \"Text\"}, {\"bbox_2d\": [25, 151, 974, 161], \"label\": \"Text\"}, {\"bbox_2d\": [25, 128, 794, 138], \"label\": \"Text\"}, {\"bbox_2d\": [25, 116, 56, 127], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 451, 75, 462], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 716, 1009, 736], \"label\": \"Text\"}, {\"bbox_2d\": [211, 192, 367, 299], \"label\": \"Picture\"}, {\"bbox_2d\": [440, 192, 594, 299], \"label\": \"Picture\"}, {\"bbox_2d\": [666, 192, 822, 296], \"label\": \"Picture\"}, {\"bbox_2d\": [211, 535, 368, 644], \"label\": \"Picture\"}, {\"bbox_2d\": [442, 536, 592, 641], \"label\": \"Picture\"}, {\"bbox_2d\": [668, 537, 822, 642], \"label\": \"Picture\"}, {\"bbox_2d\": [179, 323, 389, 353], \"label\": \"Caption\"}, {\"bbox_2d\": [412, 323, 621, 353], \"label\": \"Caption\"}, {\"bbox_2d\": [645, 323, 854, 353], \"label\": \"Caption\"}, {\"bbox_2d\": [179, 667, 389, 698], \"label\": \"Caption\"}, {\"bbox_2d\": [412, 667, 621, 698], \"label\": \"Caption\"}, {\"bbox_2d\": [645, 667, 854, 698], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 221, 998, 828], \"label\": \"Text\"}, {\"bbox_2d\": [487, 979, 558, 994], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [463, 135, 579, 164], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 56, 460, 78], \"label\": \"Page-header\"}, {\"bbox_2d\": [798, 47, 898, 89], \"label\": \"Picture\"}, {\"bbox_2d\": [486, 982, 553, 996], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 926, 199], \"label\": \"Text\"}, {\"bbox_2d\": [231, 416, 923, 473], \"label\": \"Text\"}, {\"bbox_2d\": [231, 690, 912, 733], \"label\": \"Text\"}, {\"bbox_2d\": [231, 754, 743, 925], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 913, 707, 925], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 656, 484, 666], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 381, 487, 391], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 219, 634, 376], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 493, 635, 650], \"label\": \"Picture\"}, {\"bbox_2d\": [109, 987, 693, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 925, 199], \"label\": \"Text\"}, {\"bbox_2d\": [231, 415, 924, 474], \"label\": \"Text\"}, {\"bbox_2d\": [231, 689, 911, 733], \"label\": \"Text\"}, {\"bbox_2d\": [228, 752, 629, 911], \"label\": \"Picture\"}, {\"bbox_2d\": [224, 492, 638, 650], \"label\": \"Picture\"}, {\"bbox_2d\": [228, 218, 632, 376], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 381, 486, 392], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 655, 484, 666], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 913, 705, 924], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [117, 62, 476, 352], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 368, 506, 404], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 443, 506, 625], \"label\": \"Text\"}, {\"bbox_2d\": [91, 627, 506, 674], \"label\": \"Text\"}, {\"bbox_2d\": [91, 725, 506, 804], \"label\": \"Text\"}, {\"bbox_2d\": [91, 873, 506, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 203], \"label\": \"Text\"}, {\"bbox_2d\": [536, 205, 951, 459], \"label\": \"Text\"}, {\"bbox_2d\": [536, 462, 951, 608], \"label\": \"Text\"}, {\"bbox_2d\": [536, 611, 951, 639], \"label\": \"Text\"}, {\"bbox_2d\": [536, 701, 951, 748], \"label\": \"Text\"}, {\"bbox_2d\": [536, 801, 951, 835], \"label\": \"Text\"}, {\"bbox_2d\": [536, 837, 951, 934], \"label\": \"Text\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [101, 828, 506, 854], \"label\": \"Section-header\"}, {\"bbox_2d\": [182, 677, 506, 735], \"label\": \"Formula\"}, {\"bbox_2d\": [582, 647, 951, 695], \"label\": \"Formula\"}, {\"bbox_2d\": [589, 730, 951, 783], \"label\": \"Formula\"}, {\"bbox_2d\": [543, 842, 951, 917], \"label\": \"Formula\"}, {\"bbox_2d\": [668, 914, 951, 965], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 890, 585, 904], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 951, 204], \"label\": \"Text\"}, {\"bbox_2d\": [536, 206, 951, 459], \"label\": \"Text\"}, {\"bbox_2d\": [536, 460, 951, 609], \"label\": \"Text\"}, {\"bbox_2d\": [536, 610, 951, 638], \"label\": \"Text\"}, {\"bbox_2d\": [91, 441, 506, 625], \"label\": \"Text\"}, {\"bbox_2d\": [91, 625, 506, 670], \"label\": \"Text\"}, {\"bbox_2d\": [91, 367, 506, 403], \"label\": \"Caption\"}, {\"bbox_2d\": [114, 51, 484, 350], \"label\": \"Picture\"}, {\"bbox_2d\": [185, 679, 506, 720], \"label\": \"Formula\"}, {\"bbox_2d\": [579, 649, 951, 700], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 724, 506, 799], \"label\": \"Text\"}, {\"bbox_2d\": [536, 700, 951, 743], \"label\": \"Text\"}, {\"bbox_2d\": [648, 753, 951, 804], \"label\": \"Formula\"}, {\"bbox_2d\": [629, 843, 951, 893], \"label\": \"Formula\"}, {\"bbox_2d\": [668, 917, 951, 948], \"label\": \"Formula\"}, {\"bbox_2d\": [100, 828, 497, 852], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 874, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 893, 584, 906], \"label\": \"Text\"}, {\"bbox_2d\": [536, 804, 951, 832], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 437, 800, 452], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 479, 911, 526], \"label\": \"Text\"}, {\"bbox_2d\": [121, 748, 911, 846], \"label\": \"Text\"}, {\"bbox_2d\": [121, 858, 911, 956], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [388, 555, 911, 590], \"label\": \"Formula\"}, {\"bbox_2d\": [231, 629, 911, 664], \"label\": \"Formula\"}, {\"bbox_2d\": [298, 701, 911, 737], \"label\": \"Formula\"}, {\"bbox_2d\": [259, 115, 704, 384], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [233, 437, 799, 451], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 481, 911, 524], \"label\": \"Text\"}, {\"bbox_2d\": [121, 747, 911, 846], \"label\": \"Text\"}, {\"bbox_2d\": [121, 858, 911, 956], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [388, 551, 911, 590], \"label\": \"Formula\"}, {\"bbox_2d\": [229, 632, 911, 669], \"label\": \"Formula\"}, {\"bbox_2d\": [304, 702, 911, 739], \"label\": \"Formula\"}, {\"bbox_2d\": [261, 115, 705, 382], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [291, 84, 934, 924], \"label\": \"Table\"}, {\"bbox_2d\": [702, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [440, 882, 924, 919], \"label\": \"Text\"}, {\"bbox_2d\": [440, 835, 926, 877], \"label\": \"Text\"}, {\"bbox_2d\": [502, 630, 928, 785], \"label\": \"Text\"}, {\"bbox_2d\": [502, 797, 884, 824], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [275, 82, 938, 923], \"label\": \"Table\"}, {\"bbox_2d\": [703, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [123, 193, 587, 217], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 118, 330, 133], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 455, 243, 488], \"label\": \"Caption\"}, {\"bbox_2d\": [123, 454, 483, 513], \"label\": \"Caption\"}, {\"bbox_2d\": [123, 295, 482, 513], \"label\": \"Caption\"}, {\"bbox_2d\": [123, 520, 484, 634], \"label\": \"Text\"}, {\"bbox_2d\": [123, 641, 484, 892], \"label\": \"Text\"}, {\"bbox_2d\": [123, 914, 472, 960], \"label\": \"Text\"}, {\"bbox_2d\": [523, 900, 875, 960], \"label\": \"Text\"}, {\"bbox_2d\": [523, 641, 885, 892], \"label\": \"Text\"}, {\"bbox_2d\": [523, 450, 884, 634], \"label\": \"Text\"}, {\"bbox_2d\": [523, 348, 885, 444], \"label\": \"Text\"}, {\"bbox_2d\": [523, 296, 874, 342], \"label\": \"Text\"}, {\"bbox_2d\": [292, 296, 480, 513], \"label\": \"Text\"}, {\"bbox_2d\": [523, 1003, 973, 1046], \"label\": \"Page-footer\"}, {\"bbox_2d\": [52, 1002, 223, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [11, 683, 23, 736], \"label\": \"Page-header\"}, {\"bbox_2d\": [523, 0, 1018, 157], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [123, 123, 332, 134], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 196, 588, 217], \"label\": \"Section-header\"}, {\"bbox_2d\": [12, 682, 22, 739], \"label\": \"Page-header\"}, {\"bbox_2d\": [51, 1002, 223, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [523, 899, 877, 963], \"label\": \"Text\"}, {\"bbox_2d\": [123, 917, 472, 963], \"label\": \"Text\"}, {\"bbox_2d\": [148, 641, 475, 651], \"label\": \"Text\"}, {\"bbox_2d\": [123, 658, 486, 892], \"label\": \"Text\"}, {\"bbox_2d\": [292, 297, 485, 513], \"label\": \"Text\"}, {\"bbox_2d\": [123, 521, 487, 634], \"label\": \"Text\"}, {\"bbox_2d\": [523, 297, 875, 343], \"label\": \"Text\"}, {\"bbox_2d\": [523, 348, 886, 447], \"label\": \"Text\"}, {\"bbox_2d\": [523, 452, 885, 636], \"label\": \"Text\"}, {\"bbox_2d\": [523, 641, 887, 894], \"label\": \"Text\"}, {\"bbox_2d\": [121, 295, 272, 450], \"label\": \"Picture\"}, {\"bbox_2d\": [123, 457, 243, 487], \"label\": \"Caption\"}, {\"bbox_2d\": [522, 0, 1035, 157], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [199, 92, 834, 875], \"label\": \"Table\"}, {\"bbox_2d\": [496, 988, 516, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [199, 92, 833, 875], \"label\": \"Table\"}, {\"bbox_2d\": [496, 986, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 98, 594, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 133, 310, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 170, 889, 200], \"label\": \"Text\"}, {\"bbox_2d\": [121, 368, 914, 411], \"label\": \"Text\"}, {\"bbox_2d\": [121, 235, 801, 343], \"label\": \"Table\"}, {\"bbox_2d\": [152, 412, 909, 606], \"label\": \"Table\"}, {\"bbox_2d\": [189, 658, 911, 866], \"label\": \"Table\"}, {\"bbox_2d\": [145, 623, 299, 635], \"label\": \"Section-header\"}, {\"bbox_2d\": [182, 642, 595, 654], \"label\": \"Text\"}, {\"bbox_2d\": [498, 956, 537, 969], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 599, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 129, 314, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 165, 889, 201], \"label\": \"Text\"}, {\"bbox_2d\": [154, 230, 805, 344], \"label\": \"Table\"}, {\"bbox_2d\": [141, 412, 913, 867], \"label\": \"Table\"}, {\"bbox_2d\": [121, 364, 919, 413], \"label\": \"Text\"}, {\"bbox_2d\": [498, 953, 541, 971], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 92, 156, 107], \"label\": \"Text\"}, {\"bbox_2d\": [125, 155, 472, 169], \"label\": \"Text\"}, {\"bbox_2d\": [125, 320, 909, 388], \"label\": \"Text\"}, {\"bbox_2d\": [125, 469, 909, 647], \"label\": \"Text\"}, {\"bbox_2d\": [125, 659, 909, 863], \"label\": \"Text\"}, {\"bbox_2d\": [125, 874, 909, 941], \"label\": \"Text\"}, {\"bbox_2d\": [228, 428, 807, 443], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 974, 527, 988], \"label\": \"Page-footer\"}, {\"bbox_2d\": [266, 107, 676, 136], \"label\": \"Formula\"}, {\"bbox_2d\": [212, 184, 849, 294], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [125, 92, 155, 105], \"label\": \"Text\"}, {\"bbox_2d\": [358, 104, 677, 156], \"label\": \"Formula\"}, {\"bbox_2d\": [125, 154, 518, 167], \"label\": \"Text\"}, {\"bbox_2d\": [125, 317, 911, 388], \"label\": \"Text\"}, {\"bbox_2d\": [185, 180, 850, 296], \"label\": \"Formula\"}, {\"bbox_2d\": [230, 428, 806, 444], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 470, 911, 646], \"label\": \"Text\"}, {\"bbox_2d\": [125, 659, 911, 862], \"label\": \"Text\"}, {\"bbox_2d\": [125, 875, 911, 942], \"label\": \"Text\"}, {\"bbox_2d\": [508, 974, 527, 987], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [303, 93, 327, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [439, 73, 585, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [699, 93, 722, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [133, 111, 506, 334], \"label\": \"Text\"}, {\"bbox_2d\": [133, 347, 506, 376], \"label\": \"Text\"}, {\"bbox_2d\": [133, 379, 506, 410], \"label\": \"Text\"}, {\"bbox_2d\": [133, 413, 506, 511], \"label\": \"Text\"}, {\"bbox_2d\": [133, 515, 506, 591], \"label\": \"Text\"}, {\"bbox_2d\": [133, 599, 245, 609], \"label\": \"Section-header\"}, {\"bbox_2d\": [133, 632, 506, 666], \"label\": \"Text\"}, {\"bbox_2d\": [133, 670, 506, 822], \"label\": \"Text\"}, {\"bbox_2d\": [133, 826, 506, 855], \"label\": \"Text\"}, {\"bbox_2d\": [133, 858, 506, 887], \"label\": \"Text\"}, {\"bbox_2d\": [133, 891, 506, 929], \"label\": \"Text\"}, {\"bbox_2d\": [133, 933, 506, 984], \"label\": \"Text\"}, {\"bbox_2d\": [526, 887, 895, 938], \"label\": \"Text\"}, {\"bbox_2d\": [526, 847, 895, 887], \"label\": \"Text\"}, {\"bbox_2d\": [526, 797, 895, 847], \"label\": \"Text\"}, {\"bbox_2d\": [526, 712, 895, 741], \"label\": \"Text\"}, {\"bbox_2d\": [526, 659, 895, 683], \"label\": \"Text\"}, {\"bbox_2d\": [526, 577, 895, 606], \"label\": \"Text\"}, {\"bbox_2d\": [526, 467, 895, 629], \"label\": \"Text\"}, {\"bbox_2d\": [526, 410, 895, 469], \"label\": \"Text\"}, {\"bbox_2d\": [526, 324, 895, 403], \"label\": \"Text\"}, {\"bbox_2d\": [526, 232, 895, 316], \"label\": \"Text\"}, {\"bbox_2d\": [526, 111, 895, 196], \"label\": \"Text\"}, {\"bbox_2d\": [526, 196, 652, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 206, 895, 257], \"label\": \"Text\"}, {\"bbox_2d\": [526, 688, 895, 712], \"label\": \"Text\"}, {\"bbox_2d\": [526, 772, 895, 802], \"label\": \"Text\"}, {\"bbox_2d\": [526, 933, 895, 984], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [699, 93, 722, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [438, 73, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 327, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 111, 499, 316], \"label\": \"Text\"}, {\"bbox_2d\": [526, 111, 895, 189], \"label\": \"Text\"}, {\"bbox_2d\": [541, 193, 671, 203], \"label\": \"Text\"}, {\"bbox_2d\": [526, 206, 895, 257], \"label\": \"Text\"}, {\"bbox_2d\": [524, 261, 896, 408], \"label\": \"Text\"}, {\"bbox_2d\": [130, 319, 501, 369], \"label\": \"Text\"}, {\"bbox_2d\": [131, 372, 500, 407], \"label\": \"Text\"}, {\"bbox_2d\": [130, 411, 501, 512], \"label\": \"Text\"}, {\"bbox_2d\": [524, 412, 895, 463], \"label\": \"Text\"}, {\"bbox_2d\": [524, 467, 898, 627], \"label\": \"Text\"}, {\"bbox_2d\": [130, 515, 500, 617], \"label\": \"Text\"}, {\"bbox_2d\": [146, 620, 250, 629], \"label\": \"Text\"}, {\"bbox_2d\": [131, 633, 500, 669], \"label\": \"Text\"}, {\"bbox_2d\": [130, 672, 503, 812], \"label\": \"Text\"}, {\"bbox_2d\": [131, 855, 499, 877], \"label\": \"Text\"}, {\"bbox_2d\": [130, 933, 501, 982], \"label\": \"Text\"}, {\"bbox_2d\": [524, 907, 896, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 868, 895, 904], \"label\": \"Text\"}, {\"bbox_2d\": [526, 802, 895, 865], \"label\": \"Text\"}, {\"bbox_2d\": [526, 776, 895, 799], \"label\": \"Text\"}, {\"bbox_2d\": [526, 737, 895, 773], \"label\": \"Text\"}, {\"bbox_2d\": [526, 711, 898, 734], \"label\": \"Text\"}, {\"bbox_2d\": [527, 685, 893, 708], \"label\": \"Text\"}, {\"bbox_2d\": [526, 659, 894, 682], \"label\": \"Text\"}, {\"bbox_2d\": [526, 631, 895, 655], \"label\": \"Text\"}, {\"bbox_2d\": [130, 815, 519, 851], \"label\": \"Text\"}, {\"bbox_2d\": [130, 881, 519, 930], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [778, 58, 983, 88], \"label\": \"Page-header\"}, {\"bbox_2d\": [117, 105, 533, 195], \"label\": \"Text\"}, {\"bbox_2d\": [571, 105, 986, 287], \"label\": \"Text\"}, {\"bbox_2d\": [117, 199, 533, 289], \"label\": \"Text\"}, {\"bbox_2d\": [539, 987, 566, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [159, 303, 940, 835], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [779, 58, 995, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [571, 101, 996, 287], \"label\": \"Text\"}, {\"bbox_2d\": [117, 103, 539, 195], \"label\": \"Text\"}, {\"bbox_2d\": [117, 196, 541, 288], \"label\": \"Text\"}, {\"bbox_2d\": [161, 300, 945, 840], \"label\": \"Picture\"}, {\"bbox_2d\": [539, 987, 572, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [87, 85, 472, 98], \"label\": \"Text\"}, {\"bbox_2d\": [91, 104, 503, 116], \"label\": \"Text\"}, {\"bbox_2d\": [91, 122, 539, 134], \"label\": \"Text\"}, {\"bbox_2d\": [91, 140, 323, 152], \"label\": \"Text\"}, {\"bbox_2d\": [91, 158, 430, 169], \"label\": \"Text\"}, {\"bbox_2d\": [91, 175, 411, 187], \"label\": \"Text\"}, {\"bbox_2d\": [91, 193, 349, 205], \"label\": \"Text\"}, {\"bbox_2d\": [91, 211, 376, 223], \"label\": \"Text\"}, {\"bbox_2d\": [91, 229, 365, 241], \"label\": \"Text\"}, {\"bbox_2d\": [57, 247, 668, 259], \"label\": \"Text\"}, {\"bbox_2d\": [27, 264, 754, 287], \"label\": \"Text\"}, {\"bbox_2d\": [25, 293, 1008, 319], \"label\": \"Text\"}, {\"bbox_2d\": [57, 325, 125, 337], \"label\": \"Text\"}, {\"bbox_2d\": [27, 366, 1008, 389], \"label\": \"Text\"}, {\"bbox_2d\": [57, 389, 1008, 413], \"label\": \"Text\"}, {\"bbox_2d\": [57, 413, 1008, 441], \"label\": \"Text\"}, {\"bbox_2d\": [57, 441, 1008, 468], \"label\": \"Text\"}, {\"bbox_2d\": [57, 468, 1008, 506], \"label\": \"Text\"}, {\"bbox_2d\": [57, 506, 1008, 533], \"label\": \"Text\"}, {\"bbox_2d\": [57, 533, 1008, 571], \"label\": \"Text\"}, {\"bbox_2d\": [57, 571, 1008, 610], \"label\": \"Text\"}, {\"bbox_2d\": [57, 610, 1008, 637], \"label\": \"Text\"}, {\"bbox_2d\": [504, 671, 529, 683], \"label\": \"Page-footer\"}, {\"bbox_2d\": [57, 554, 1008, 571], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 526, 1008, 553], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 499, 1008, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 472, 1008, 499], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 439, 1008, 472], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 388, 1008, 405], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 344, 91, 356], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 366, 83, 378], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 390, 83, 402], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 413, 83, 425], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 325, 125, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 264, 1008, 287], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 86, 473, 97], \"label\": \"Text\"}, {\"bbox_2d\": [91, 104, 502, 115], \"label\": \"Text\"}, {\"bbox_2d\": [91, 122, 539, 133], \"label\": \"Text\"}, {\"bbox_2d\": [91, 140, 324, 151], \"label\": \"Text\"}, {\"bbox_2d\": [91, 158, 430, 169], \"label\": \"Text\"}, {\"bbox_2d\": [91, 176, 412, 187], \"label\": \"Text\"}, {\"bbox_2d\": [91, 194, 349, 204], \"label\": \"Text\"}, {\"bbox_2d\": [91, 212, 376, 223], \"label\": \"Text\"}, {\"bbox_2d\": [58, 248, 668, 258], \"label\": \"Text\"}, {\"bbox_2d\": [25, 297, 1006, 319], \"label\": \"Text\"}, {\"bbox_2d\": [58, 230, 365, 241], \"label\": \"Section-header\"}, {\"bbox_2d\": [58, 326, 126, 337], \"label\": \"Section-header\"}, {\"bbox_2d\": [26, 344, 1009, 637], \"label\": \"Table\"}, {\"bbox_2d\": [504, 672, 529, 682], \"label\": \"Page-footer\"}, {\"bbox_2d\": [26, 265, 1008, 285], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [741, 59, 923, 86], \"label\": \"Page-header\"}, {\"bbox_2d\": [207, 94, 471, 208], \"label\": \"Picture\"}, {\"bbox_2d\": [206, 220, 630, 235], \"label\": \"Caption\"}, {\"bbox_2d\": [178, 235, 299, 261], \"label\": \"Caption\"}, {\"bbox_2d\": [155, 323, 250, 359], \"label\": \"Caption\"}, {\"bbox_2d\": [336, 361, 454, 397], \"label\": \"Caption\"}, {\"bbox_2d\": [178, 351, 303, 390], \"label\": \"Caption\"}, {\"bbox_2d\": [436, 394, 542, 419], \"label\": \"Caption\"}, {\"bbox_2d\": [154, 488, 446, 522], \"label\": \"Caption\"}, {\"bbox_2d\": [199, 541, 780, 556], \"label\": \"Caption\"}, {\"bbox_2d\": [251, 490, 446, 522], \"label\": \"Caption\"}, {\"bbox_2d\": [595, 488, 871, 517], \"label\": \"Caption\"}, {\"bbox_2d\": [157, 639, 250, 681], \"label\": \"Caption\"}, {\"bbox_2d\": [282, 610, 495, 642], \"label\": \"Caption\"}, {\"bbox_2d\": [526, 676, 844, 691], \"label\": \"Caption\"}, {\"bbox_2d\": [199, 561, 578, 610], \"label\": \"Picture\"}, {\"bbox_2d\": [200, 638, 625, 731], \"label\": \"Picture\"}, {\"bbox_2d\": [284, 738, 477, 753], \"label\": \"Caption\"}, {\"bbox_2d\": [526, 741, 856, 793], \"label\": \"Picture\"}, {\"bbox_2d\": [524, 827, 868, 928], \"label\": \"Picture\"}, {\"bbox_2d\": [208, 940, 735, 959], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 994, 516, 1006], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [737, 56, 934, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [267, 939, 729, 954], \"label\": \"Caption\"}, {\"bbox_2d\": [205, 541, 782, 556], \"label\": \"Caption\"}, {\"bbox_2d\": [202, 92, 889, 211], \"label\": \"Picture\"}, {\"bbox_2d\": [139, 230, 906, 522], \"label\": \"Picture\"}, {\"bbox_2d\": [354, 216, 711, 232], \"label\": \"Caption\"}, {\"bbox_2d\": [94, 558, 934, 930], \"label\": \"Picture\"}, {\"bbox_2d\": [494, 988, 517, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 225, 25], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 37, 1009, 88], \"label\": \"Text\"}, {\"bbox_2d\": [25, 98, 1009, 140], \"label\": \"Text\"}, {\"bbox_2d\": [25, 140, 1009, 166], \"label\": \"Text\"}, {\"bbox_2d\": [25, 208, 1009, 249], \"label\": \"Text\"}, {\"bbox_2d\": [25, 250, 1009, 302], \"label\": \"Text\"}, {\"bbox_2d\": [25, 302, 1009, 334], \"label\": \"Text\"}, {\"bbox_2d\": [25, 334, 1009, 366], \"label\": \"Text\"}, {\"bbox_2d\": [25, 366, 1009, 398], \"label\": \"Text\"}, {\"bbox_2d\": [513, 361, 522, 370], \"label\": \"Page-footer\"}, {\"bbox_2d\": [23, 168, 1012, 194], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [22, 12, 225, 25], \"label\": \"Page-header\"}, {\"bbox_2d\": [22, 38, 1012, 89], \"label\": \"Text\"}, {\"bbox_2d\": [22, 106, 1012, 128], \"label\": \"Text\"}, {\"bbox_2d\": [22, 208, 1012, 244], \"label\": \"Text\"}, {\"bbox_2d\": [22, 255, 1012, 306], \"label\": \"Text\"}, {\"bbox_2d\": [22, 319, 1012, 334], \"label\": \"Text\"}, {\"bbox_2d\": [511, 362, 522, 369], \"label\": \"Page-footer\"}, {\"bbox_2d\": [22, 312, 148, 319], \"label\": \"Section-header\"}, {\"bbox_2d\": [22, 99, 323, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 133, 1012, 196], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [26, 14, 84, 27], \"label\": \"Page-header\"}, {\"bbox_2d\": [129, 18, 353, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [86, 100, 918, 452], \"label\": \"Table\"}, {\"bbox_2d\": [86, 467, 918, 813], \"label\": \"Table\"}, {\"bbox_2d\": [91, 822, 900, 846], \"label\": \"Text\"}, {\"bbox_2d\": [91, 950, 908, 973], \"label\": \"Text\"}, {\"bbox_2d\": [91, 999, 169, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 845, 917, 951], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 20, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [86, 824, 901, 845], \"label\": \"Text\"}, {\"bbox_2d\": [86, 103, 917, 451], \"label\": \"Table\"}, {\"bbox_2d\": [86, 469, 917, 815], \"label\": \"Table\"}, {\"bbox_2d\": [100, 846, 917, 946], \"label\": \"Table\"}, {\"bbox_2d\": [86, 951, 909, 972], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 24, 164, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 113, 577, 124], \"label\": \"Text\"}, {\"bbox_2d\": [25, 239, 773, 250], \"label\": \"Text\"}, {\"bbox_2d\": [25, 129, 1010, 222], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [25, 24, 164, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 114, 577, 123], \"label\": \"Text\"}, {\"bbox_2d\": [25, 129, 1011, 223], \"label\": \"Table\"}, {\"bbox_2d\": [25, 240, 773, 250], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [999, 334, 1009, 701], \"label\": \"Page-header\"}, {\"bbox_2d\": [60, 45, 253, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 77, 203, 92], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 89, 692, 104], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 170, 833, 186], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 201, 586, 212], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 243, 893, 257], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 329, 683, 344], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 360, 737, 372], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 994, 526, 1005], \"label\": \"Page-footer\"}, {\"bbox_2d\": [60, 113, 506, 405], \"label\": \"Picture\"}, {\"bbox_2d\": [528, 100, 967, 424], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [60, 42, 252, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 990, 527, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [995, 334, 1014, 702], \"label\": \"Page-header\"}, {\"bbox_2d\": [56, 71, 977, 433], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 1009, 60], \"label\": \"Text\"}, {\"bbox_2d\": [25, 73, 1009, 174], \"label\": \"Text\"}, {\"bbox_2d\": [25, 178, 1009, 288], \"label\": \"Text\"}, {\"bbox_2d\": [25, 301, 1009, 456], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 993, 59], \"label\": \"Text\"}, {\"bbox_2d\": [25, 72, 1008, 167], \"label\": \"Text\"}, {\"bbox_2d\": [25, 179, 1005, 287], \"label\": \"Text\"}, {\"bbox_2d\": [24, 299, 1003, 455], \"label\": \"Text\"}, {\"bbox_2d\": [508, 467, 525, 478], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [128, 19, 353, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [41, 997, 169, 1017], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 102, 917, 286], \"label\": \"Table\"}, {\"bbox_2d\": [85, 307, 780, 501], \"label\": \"Table\"}, {\"bbox_2d\": [85, 519, 636, 709], \"label\": \"Table\"}, {\"bbox_2d\": [85, 730, 918, 915], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 20, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [86, 99, 917, 287], \"label\": \"Table\"}, {\"bbox_2d\": [84, 307, 781, 498], \"label\": \"Table\"}, {\"bbox_2d\": [84, 520, 632, 709], \"label\": \"Table\"}, {\"bbox_2d\": [82, 731, 918, 916], \"label\": \"Table\"}, {\"bbox_2d\": [43, 1001, 231, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [148, 106, 887, 148], \"label\": \"Text\"}, {\"bbox_2d\": [148, 184, 887, 249], \"label\": \"Text\"}, {\"bbox_2d\": [148, 286, 887, 352], \"label\": \"Text\"}, {\"bbox_2d\": [148, 389, 887, 556], \"label\": \"Text\"}, {\"bbox_2d\": [148, 592, 887, 657], \"label\": \"Text\"}, {\"bbox_2d\": [148, 667, 887, 709], \"label\": \"Text\"}, {\"bbox_2d\": [148, 744, 887, 785], \"label\": \"Text\"}, {\"bbox_2d\": [212, 821, 881, 835], \"label\": \"Text\"}, {\"bbox_2d\": [148, 871, 887, 912], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [148, 109, 888, 147], \"label\": \"Text\"}, {\"bbox_2d\": [149, 184, 885, 249], \"label\": \"Text\"}, {\"bbox_2d\": [148, 287, 886, 351], \"label\": \"Text\"}, {\"bbox_2d\": [148, 388, 888, 554], \"label\": \"Text\"}, {\"bbox_2d\": [148, 591, 886, 656], \"label\": \"Text\"}, {\"bbox_2d\": [148, 669, 886, 707], \"label\": \"Text\"}, {\"bbox_2d\": [149, 746, 884, 783], \"label\": \"Text\"}, {\"bbox_2d\": [212, 822, 882, 834], \"label\": \"Text\"}, {\"bbox_2d\": [149, 873, 888, 911], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 91, 701, 103], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 126, 608, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 161, 911, 197], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 220, 908, 256], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 279, 910, 315], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 337, 911, 373], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 396, 910, 432], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 455, 900, 467], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 489, 911, 525], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 547, 911, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 606, 910, 642], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 664, 910, 727], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 749, 910, 785], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 808, 821, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 842, 911, 878], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 900, 911, 936], \"label\": \"List-item\"}, {\"bbox_2d\": [508, 975, 527, 987], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [125, 91, 700, 104], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 126, 607, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 160, 911, 197], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 219, 911, 256], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 278, 911, 314], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 336, 911, 373], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 395, 911, 432], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 454, 901, 466], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 488, 911, 525], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 547, 911, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 606, 910, 643], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 665, 911, 726], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 748, 911, 785], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 807, 822, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 842, 911, 878], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 900, 911, 937], \"label\": \"List-item\"}, {\"bbox_2d\": [509, 974, 527, 987], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 165], \"label\": \"Text\"}, {\"bbox_2d\": [121, 319, 911, 441], \"label\": \"Text\"}, {\"bbox_2d\": [121, 512, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [315, 181, 911, 220], \"label\": \"Formula\"}, {\"bbox_2d\": [337, 273, 911, 295], \"label\": \"Formula\"}, {\"bbox_2d\": [409, 459, 911, 502], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 234, 483, 248], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 165], \"label\": \"Text\"}, {\"bbox_2d\": [121, 233, 484, 249], \"label\": \"Text\"}, {\"bbox_2d\": [121, 316, 911, 442], \"label\": \"Text\"}, {\"bbox_2d\": [121, 511, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [413, 446, 911, 494], \"label\": \"Formula\"}, {\"bbox_2d\": [336, 273, 911, 293], \"label\": \"Formula\"}, {\"bbox_2d\": [391, 169, 911, 218], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 71, 506, 99], \"label\": \"Text\"}, {\"bbox_2d\": [91, 105, 506, 258], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 158], \"label\": \"Text\"}, {\"bbox_2d\": [536, 155, 951, 364], \"label\": \"Text\"}, {\"bbox_2d\": [91, 263, 506, 375], \"label\": \"Text\"}, {\"bbox_2d\": [91, 382, 506, 511], \"label\": \"Text\"}, {\"bbox_2d\": [91, 518, 506, 879], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 506, 949], \"label\": \"Text\"}, {\"bbox_2d\": [536, 394, 951, 411], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 411, 951, 481], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 481, 951, 520], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 520, 951, 560], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 560, 951, 600], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 600, 951, 637], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 637, 951, 675], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 675, 951, 703], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 703, 951, 731], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 731, 951, 759], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 759, 951, 787], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 787, 951, 815], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 815, 951, 843], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 843, 951, 871], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 871, 951, 909], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 909, 951, 937], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 937, 951, 965], \"label\": \"List-item\"}, {\"bbox_2d\": [943, 38, 951, 50], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 71, 506, 100], \"label\": \"Text\"}, {\"bbox_2d\": [91, 101, 506, 231], \"label\": \"Text\"}, {\"bbox_2d\": [91, 233, 506, 374], \"label\": \"Text\"}, {\"bbox_2d\": [91, 383, 506, 506], \"label\": \"Text\"}, {\"bbox_2d\": [91, 516, 506, 795], \"label\": \"Text\"}, {\"bbox_2d\": [91, 797, 506, 951], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 131], \"label\": \"Text\"}, {\"bbox_2d\": [536, 153, 951, 338], \"label\": \"Text\"}, {\"bbox_2d\": [553, 398, 902, 410], \"label\": \"Text\"}, {\"bbox_2d\": [544, 412, 951, 423], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 425, 951, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 480, 951, 519], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 521, 951, 560], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 562, 951, 601], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 603, 951, 642], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 644, 951, 683], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 685, 951, 711], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 713, 951, 724], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 726, 951, 752], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 754, 907, 765], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 767, 951, 793], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 795, 951, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 822, 951, 848], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 849, 951, 875], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 877, 951, 902], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 904, 951, 916], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 918, 951, 943], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [745, 481, 930, 560], \"label\": \"Picture\"}, {\"bbox_2d\": [639, 908, 928, 967], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [742, 475, 929, 561], \"label\": \"Picture\"}, {\"bbox_2d\": [641, 908, 926, 965], \"label\": \"Text\"}, {\"bbox_2d\": [0, 47, 614, 986], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [295, 253, 601, 274], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 310, 575, 324], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 429, 746, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 886, 578, 898], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 903, 868, 943], \"label\": \"Text\"}, {\"bbox_2d\": [334, 754, 883, 868], \"label\": \"Text\"}, {\"bbox_2d\": [334, 666, 883, 752], \"label\": \"Text\"}, {\"bbox_2d\": [334, 578, 875, 664], \"label\": \"Text\"}, {\"bbox_2d\": [334, 520, 875, 572], \"label\": \"Text\"}, {\"bbox_2d\": [334, 446, 873, 515], \"label\": \"Text\"}, {\"bbox_2d\": [334, 328, 877, 413], \"label\": \"Text\"}, {\"bbox_2d\": [51, 999, 223, 1012], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [335, 314, 570, 323], \"label\": \"Section-header\"}, {\"bbox_2d\": [335, 432, 742, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [335, 329, 876, 412], \"label\": \"Text\"}, {\"bbox_2d\": [335, 447, 865, 515], \"label\": \"Text\"}, {\"bbox_2d\": [335, 521, 876, 574], \"label\": \"Text\"}, {\"bbox_2d\": [335, 580, 873, 663], \"label\": \"Text\"}, {\"bbox_2d\": [335, 668, 877, 751], \"label\": \"Text\"}, {\"bbox_2d\": [335, 757, 878, 869], \"label\": \"Text\"}, {\"bbox_2d\": [335, 890, 575, 899], \"label\": \"Section-header\"}, {\"bbox_2d\": [335, 905, 864, 943], \"label\": \"Text\"}, {\"bbox_2d\": [51, 1002, 223, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [12, 470, 22, 532], \"label\": \"Page-header\"}, {\"bbox_2d\": [296, 255, 597, 274], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [152, 93, 912, 309], \"label\": \"Table\"}, {\"bbox_2d\": [121, 324, 344, 336], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 699, 448, 712], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 339, 919, 395], \"label\": \"Text\"}, {\"bbox_2d\": [121, 411, 917, 468], \"label\": \"Text\"}, {\"bbox_2d\": [121, 483, 445, 495], \"label\": \"Text\"}, {\"bbox_2d\": [121, 600, 923, 684], \"label\": \"Text\"}, {\"bbox_2d\": [121, 851, 918, 877], \"label\": \"Text\"}, {\"bbox_2d\": [198, 730, 414, 742], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 746, 620, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 762, 495, 774], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 778, 378, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 794, 517, 806], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 810, 485, 822], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 826, 825, 833], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 499, 917, 525], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 529, 766, 541], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 573, 917, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 796, 458, 806], \"label\": \"List-item\"}, {\"bbox_2d\": [497, 941, 536, 954], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 851, 917, 877], \"label\": \"Text\"}, {\"bbox_2d\": [121, 569, 917, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 411, 917, 468], \"label\": \"Text\"}, {\"bbox_2d\": [121, 339, 917, 395], \"label\": \"Text\"}, {\"bbox_2d\": [121, 569, 917, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 569, 917, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 569, 917, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [184, 545, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\":", "label": "```json[{\"bbox_2d\": [152, 495, 919, 525], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 525, 770, 540], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 540, 917, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 569, 917, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 321, 344, 336], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 338, 919, 396], \"label\": \"Text\"}, {\"bbox_2d\": [121, 409, 918, 467], \"label\": \"Text\"}, {\"bbox_2d\": [121, 481, 447, 496], \"label\": \"Text\"}, {\"bbox_2d\": [121, 598, 918, 685], \"label\": \"Text\"}, {\"bbox_2d\": [121, 847, 917, 877], \"label\": \"Text\"}, {\"bbox_2d\": [121, 697, 452, 714], \"label\": \"Section-header\"}, {\"bbox_2d\": [198, 728, 417, 743], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 743, 623, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 758, 498, 773], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 773, 493, 788], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 788, 520, 803], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 803, 488, 818], \"label\": \"List-item\"}, {\"bbox_2d\": [198, 818, 829, 833], \"label\": \"List-item\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}, {\"bbox_2d\": [148, 91, 916, 309], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 904, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 131, 926, 159], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 274, 869, 301], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 465, 926, 509], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 610, 926, 671], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 987, 698, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [109, 184, 508, 201], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 220, 923, 263], \"label\": \"Text\"}, {\"bbox_2d\": [255, 319, 585, 329], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 526, 714, 536], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 686, 389, 697], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 341, 855, 447], \"label\": \"Text\"}, {\"bbox_2d\": [255, 549, 829, 591], \"label\": \"Text\"}, {\"bbox_2d\": [231, 611, 926, 671], \"label\": \"Text\"}, {\"bbox_2d\": [109, 709, 926, 951], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 904, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 131, 926, 159], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 274, 869, 301], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 466, 926, 510], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 611, 926, 670], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 186, 503, 201], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 220, 922, 263], \"label\": \"Text\"}, {\"bbox_2d\": [255, 318, 579, 329], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 526, 707, 537], \"label\": \"Caption\"}, {\"bbox_2d\": [251, 333, 926, 452], \"label\": \"Table\"}, {\"bbox_2d\": [109, 703, 924, 952], \"label\": \"Table\"}, {\"bbox_2d\": [255, 548, 822, 591], \"label\": \"Text\"}, {\"bbox_2d\": [109, 687, 383, 698], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 103, 699, 273], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 278, 559, 290], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 313, 879, 341], \"label\": \"Text\"}, {\"bbox_2d\": [231, 362, 557, 377], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 383, 926, 442], \"label\": \"Text\"}, {\"bbox_2d\": [231, 458, 907, 502], \"label\": \"Text\"}, {\"bbox_2d\": [231, 526, 927, 892], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 897, 606, 908], \"label\": \"Caption\"}, {\"bbox_2d\": [703, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [229, 101, 699, 276], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 279, 558, 290], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 898, 606, 909], \"label\": \"Caption\"}, {\"bbox_2d\": [229, 521, 926, 895], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 313, 892, 340], \"label\": \"Text\"}, {\"bbox_2d\": [231, 382, 926, 441], \"label\": \"Text\"}, {\"bbox_2d\": [231, 458, 912, 501], \"label\": \"Text\"}, {\"bbox_2d\": [231, 362, 557, 377], \"label\": \"Section-header\"}, {\"bbox_2d\": [703, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [128, 80, 421, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [128, 112, 933, 170], \"label\": \"Text\"}, {\"bbox_2d\": [125, 247, 725, 664], \"label\": \"Table\"}, {\"bbox_2d\": [783, 262, 992, 581], \"label\": \"Picture\"}, {\"bbox_2d\": [783, 622, 986, 823], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 706, 362, 955], \"label\": \"Picture\"}, {\"bbox_2d\": [486, 706, 716, 957], \"label\": \"Picture\"}, {\"bbox_2d\": [762, 999, 818, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [783, 830, 957, 959], \"label\": \"Picture\"}, {\"bbox_2d\": [125, 830, 957, 957], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [128, 78, 419, 103], \"label\": \"Section-header\"}, {\"bbox_2d\": [128, 111, 955, 169], \"label\": \"Text\"}, {\"bbox_2d\": [767, 1001, 929, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [113, 248, 742, 663], \"label\": \"Table\"}, {\"bbox_2d\": [122, 706, 363, 955], \"label\": \"Picture\"}, {\"bbox_2d\": [478, 706, 716, 956], \"label\": \"Picture\"}, {\"bbox_2d\": [782, 665, 986, 958], \"label\": \"Picture\"}, {\"bbox_2d\": [785, 308, 989, 580], \"label\": \"Picture\"}, {\"bbox_2d\": [787, 262, 989, 296], \"label\": \"Caption\"}, {\"bbox_2d\": [818, 622, 947, 656], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [83, 114, 389, 455], \"label\": \"Picture\"}, {\"bbox_2d\": [83, 721, 997, 897], \"label\": \"Picture\"}, {\"bbox_2d\": [85, 927, 97, 942], \"label\": \"Page-footer\"}, {\"bbox_2d\": [410, 147, 682, 206], \"label\": \"Text\"}, {\"bbox_2d\": [410, 213, 682, 297], \"label\": \"Text\"}, {\"bbox_2d\": [712, 147, 988, 315], \"label\": \"Text\"}, {\"bbox_2d\": [712, 323, 981, 519], \"label\": \"Text\"}, {\"bbox_2d\": [712, 527, 988, 675], \"label\": \"Text\"}, {\"bbox_2d\": [109, 487, 369, 666], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 367, 350, 444], \"label\": \"Caption\"}, {\"bbox_2d\": [410, 349, 686, 653], \"label\": \"Text\"}, {\"bbox_2d\": [432, 661, 682, 675], \"label\": \"Text\"}, {\"bbox_2d\": [410, 321, 606, 339], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [107, 142, 364, 354], \"label\": \"Picture\"}, {\"bbox_2d\": [107, 367, 350, 442], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 491, 369, 667], \"label\": \"Text\"}, {\"bbox_2d\": [410, 145, 682, 206], \"label\": \"Text\"}, {\"bbox_2d\": [410, 212, 685, 295], \"label\": \"Text\"}, {\"bbox_2d\": [410, 346, 688, 653], \"label\": \"Text\"}, {\"bbox_2d\": [432, 660, 683, 676], \"label\": \"Text\"}, {\"bbox_2d\": [410, 322, 604, 341], \"label\": \"Section-header\"}, {\"bbox_2d\": [711, 145, 988, 317], \"label\": \"Text\"}, {\"bbox_2d\": [711, 324, 989, 519], \"label\": \"Text\"}, {\"bbox_2d\": [711, 525, 990, 676], \"label\": \"Text\"}, {\"bbox_2d\": [107, 725, 238, 868], \"label\": \"Picture\"}, {\"bbox_2d\": [259, 725, 390, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [409, 724, 540, 868], \"label\": \"Picture\"}, {\"bbox_2d\": [560, 725, 691, 867], \"label\": \"Picture\"}, {\"bbox_2d\": [712, 725, 843, 868], \"label\": \"Picture\"}, {\"bbox_2d\": [109, 878, 249, 901], \"label\": \"Caption\"}, {\"bbox_2d\": [260, 878, 391, 915], \"label\": \"Caption\"}, {\"bbox_2d\": [411, 878, 527, 901], \"label\": \"Caption\"}, {\"bbox_2d\": [562, 878, 673, 901], \"label\": \"Caption\"}, {\"bbox_2d\": [713, 878, 838, 901], \"label\": \"Caption\"}, {\"bbox_2d\": [85, 928, 96, 944], \"label\": \"Page-footer\"}, {\"bbox_2d\": [863, 724, 994, 866], \"label\": \"Picture\"}, {\"bbox_2d\": [864, 878, 978, 888], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [863, 1002, 990, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [120, 105, 357, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [374, 109, 502, 120], \"label\": \"Text\"}, {\"bbox_2d\": [119, 149, 367, 212], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 232, 252, 244], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 369, 242, 381], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 505, 323, 517], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 611, 355, 623], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 734, 217, 746], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 842, 214, 854], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 149, 545, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 271, 524, 283], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 392, 565, 418], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 514, 589, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 635, 512, 647], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 743, 628, 755], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 865, 534, 877], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 858, 301, 925], \"label\": \"Text\"}, {\"bbox_2d\": [119, 752, 356, 820], \"label\": \"Text\"}, {\"bbox_2d\": [417, 759, 647, 844], \"label\": \"Text\"}, {\"bbox_2d\": [417, 653, 625, 721], \"label\": \"Text\"}, {\"bbox_2d\": [417, 530, 638, 616], \"label\": \"Text\"}, {\"bbox_2d\": [119, 523, 337, 590], \"label\": \"Text\"}, {\"bbox_2d\": [119, 386, 351, 485], \"label\": \"Text\"}, {\"bbox_2d\": [119, 249, 344, 348], \"label\": \"Text\"}, {\"bbox_2d\": [417, 166, 612, 250], \"label\": \"Text\"}, {\"bbox_2d\": [417, 288, 627, 372], \"label\": \"Text\"}, {\"bbox_2d\": [417, 423, 638, 491], \"label\": \"Text\"}, {\"bbox_2d\": [417, 594, 564, 616], \"label\": \"Text\"}, {\"bbox_2d\": [417, 690, 564, 721], \"label\": \"Text\"}, {\"bbox_2d\": [417, 759, 636, 815], \"label\": \"Text\"}, {\"bbox_2d\": [417, 882, 651, 965], \"label\": \"Text\"}, {\"bbox_2d\": [714, 149, 922, 233], \"label\": \"Text\"}, {\"bbox_2d\": [714, 256, 928, 340], \"label\": \"Text\"}, {\"bbox_2d\": [714, 364, 949, 420], \"label\": \"Text\"}, {\"bbox_2d\": [714, 433, 870, 455], \"label\": \"Text\"}, {\"bbox_2d\": [714, 487, 937, 526], \"label\": \"Text\"}, {\"bbox_2d\": [714, 545, 904, 589], \"label\": \"Text\"}, {\"bbox_2d\": [714, 593, 945, 662], \"label\": \"Text\"}, {\"bbox_2d\": [714, 667, 870, 690], \"label\": \"Text\"}, {\"bbox_2d\": [714, 709, 955, 755], \"label\": \"Text\"}, {\"bbox_2d\": [714, 759, 951, 815], \"label\": \"Text\"}, {\"bbox_2d\": [714, 819, 870, 842], \"label\": \"Text\"}, {\"bbox_2d\": [714, 846, 870, 868], \"label\": \"Text\"}, {\"bbox_2d\": [417, 832, 608, 844], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [120, 105, 502, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [127, 151, 283, 164], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 183, 292, 209], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 232, 253, 244], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 369, 242, 381], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 505, 323, 518], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 612, 355, 625], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 734, 217, 747], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 841, 215, 854], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 149, 545, 162], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 271, 525, 284], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 393, 563, 420], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 514, 590, 527], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 636, 513, 648], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 743, 628, 755], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 864, 535, 877], \"label\": \"Section-header\"}, {\"bbox_2d\": [714, 149, 921, 162], \"label\": \"Section-header\"}, {\"bbox_2d\": [714, 256, 822, 269], \"label\": \"Section-header\"}, {\"bbox_2d\": [714, 363, 946, 375], \"label\": \"Section-header\"}, {\"bbox_2d\": [714, 486, 938, 524], \"label\": \"Section-header\"}, {\"bbox_2d\": [714, 546, 910, 588], \"label\": \"Section-header\"}, {\"bbox_2d\": [714, 711, 910, 752], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 249, 345, 348], \"label\": \"Text\"}, {\"bbox_2d\": [120, 385, 350, 485], \"label\": \"Text\"}, {\"bbox_2d\": [120, 522, 334, 592], \"label\": \"Text\"}, {\"bbox_2d\": [120, 629, 347, 713], \"label\": \"Text\"}, {\"bbox_2d\": [120, 751, 355, 820], \"label\": \"Text\"}, {\"bbox_2d\": [120, 858, 301, 927], \"label\": \"Text\"}, {\"bbox_2d\": [417, 881, 649, 966], \"label\": \"Text\"}, {\"bbox_2d\": [417, 759, 652, 844], \"label\": \"Text\"}, {\"bbox_2d\": [417, 653, 624, 722], \"label\": \"Text\"}, {\"bbox_2d\": [417, 531, 606, 615], \"label\": \"Text\"}, {\"bbox_2d\": [417, 424, 644, 493], \"label\": \"Text\"}, {\"bbox_2d\": [417, 288, 628, 372], \"label\": \"Text\"}, {\"bbox_2d\": [417, 166, 612, 250], \"label\": \"Text\"}, {\"bbox_2d\": [714, 166, 933, 235], \"label\": \"Text\"}, {\"bbox_2d\": [714, 273, 929, 342], \"label\": \"Text\"}, {\"bbox_2d\": [714, 380, 957, 449], \"label\": \"Text\"}, {\"bbox_2d\": [714, 592, 944, 691], \"label\": \"Text\"}, {\"bbox_2d\": [714, 757, 959, 841], \"label\": \"Text\"}, {\"bbox_2d\": [863, 999, 992, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 72, 506, 97], \"label\": \"Text\"}, {\"bbox_2d\": [536, 72, 951, 186], \"label\": \"Text\"}, {\"bbox_2d\": [536, 188, 951, 290], \"label\": \"Text\"}, {\"bbox_2d\": [536, 277, 951, 323], \"label\": \"Text\"}, {\"bbox_2d\": [536, 362, 951, 408], \"label\": \"Text\"}, {\"bbox_2d\": [91, 272, 506, 311], \"label\": \"Text\"}, {\"bbox_2d\": [91, 357, 506, 396], \"label\": \"Text\"}, {\"bbox_2d\": [91, 410, 224, 423], \"label\": \"Text\"}, {\"bbox_2d\": [536, 450, 951, 605], \"label\": \"Text\"}, {\"bbox_2d\": [91, 514, 506, 589], \"label\": \"Text\"}, {\"bbox_2d\": [536, 614, 951, 724], \"label\": \"Text\"}, {\"bbox_2d\": [536, 773, 951, 830], \"label\": \"Text\"}, {\"bbox_2d\": [536, 888, 951, 951], \"label\": \"Text\"}, {\"bbox_2d\": [91, 738, 506, 830], \"label\": \"Text\"}, {\"bbox_2d\": [536, 887, 951, 933], \"label\": \"Text\"}, {\"bbox_2d\": [91, 877, 506, 933], \"label\": \"Text\"}, {\"bbox_2d\": [591, 705, 951, 737], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 571, 506, 653], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 661, 414, 674], \"label\": \"Text\"}, {\"bbox_2d\": [91, 713, 506, 727], \"label\": \"Text\"}, {\"bbox_2d\": [208, 686, 506, 704], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 763, 506, 776], \"label\": \"Text\"}, {\"bbox_2d\": [152, 821, 506, 839], \"label\": \"Formula\"}, {\"bbox_2d\": [591, 815, 951, 858], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 432, 506, 514], \"label\": \"Formula\"}, {\"bbox_2d\": [61, 336, 506, 375], \"label\": \"Formula\"}, {\"bbox_2d\": [60, 299, 506, 337], \"label\": \"Formula\"}, {\"bbox_2d\": [148, 211, 506, 260], \"label\": \"Formula\"}, {\"bbox_2d\": [169, 272, 506, 310], \"label\": \"Formula\"}, {\"bbox_2d\": [596, 247, 951, 275], \"label\": \"Formula\"}, {\"bbox_2d\": [562, 362, 951, 380], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 421, 925, 432], \"label\": \"Section-header\"}, {\"bbox_2d\": [267, 110, 506, 139], \"label\": \"Formula\"}, {\"bbox_2d\": [179, 916, 506, 951], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 787, 506, 805], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 970, 224, 987], \"label\": \"Text\"}, {\"bbox_2d\": [91, 847, 506, 870], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 951, 168], \"label\": \"Text\"}, {\"bbox_2d\": [536, 161, 951, 234], \"label\": \"Text\"}, {\"bbox_2d\": [692, 248, 951, 271], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 276, 951, 319], \"label\": \"Text\"}, {\"bbox_2d\": [536, 362, 951, 390], \"label\": \"Text\"}, {\"bbox_2d\": [556, 334, 951, 356], \"label\": \"Formula\"}, {\"bbox_2d\": [561, 421, 927, 432], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 453, 951, 581], \"label\": \"Text\"}, {\"bbox_2d\": [612, 586, 951, 608], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 614, 951, 717], \"label\": \"Text\"}, {\"bbox_2d\": [606, 730, 951, 769], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 774, 951, 832], \"label\": \"Text\"}, {\"bbox_2d\": [619, 844, 951, 884], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 889, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [154, 918, 506, 958], \"label\": \"Formula\"}, {\"bbox_2d\": [169, 820, 506, 860], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 849, 506, 906], \"label\": \"Text\"}, {\"bbox_2d\": [91, 513, 506, 586], \"label\": \"Text\"}, {\"bbox_2d\": [94, 431, 506, 511], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 310, 531, 356], \"label\": \"Formula\"}, {\"bbox_2d\": [188, 384, 506, 406], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 358, 479, 371], \"label\": \"Text\"}, {\"bbox_2d\": [91, 409, 224, 422], \"label\": \"Text\"}, {\"bbox_2d\": [91, 271, 506, 299], \"label\": \"Text\"}, {\"bbox_2d\": [91, 152, 505, 206], \"label\": \"Text\"}, {\"bbox_2d\": [146, 209, 506, 248], \"label\": \"Formula\"}, {\"bbox_2d\": [162, 245, 506, 267], \"label\": \"Formula\"}, {\"bbox_2d\": [214, 110, 506, 150], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 71, 506, 99], \"label\": \"Text\"}, {\"bbox_2d\": [91, 662, 418, 675], \"label\": \"Text\"}, {\"bbox_2d\": [92, 598, 506, 658], \"label\": \"Formula\"}, {\"bbox_2d\": [153, 685, 506, 725], \"label\": \"Formula\"}, {\"bbox_2d\": [99, 738, 506, 777], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 713, 240, 726], \"label\": \"Text\"}, {\"bbox_2d\": [91, 767, 506, 819], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 139], \"label\": \"Text\"}, {\"bbox_2d\": [131, 210, 526, 224], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 238, 704, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 266, 741, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 294, 911, 337], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 351, 894, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 379, 714, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 407, 808, 421], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 435, 811, 450], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 464, 811, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 492, 880, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 521, 762, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 549, 781, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 578, 738, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 606, 911, 649], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 663, 911, 706], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 720, 911, 763], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 777, 911, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 834, 741, 848], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 862, 867, 876], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 890, 664, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 918, 634, 932], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 946, 764, 960], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 910, 141], \"label\": \"Text\"}, {\"bbox_2d\": [131, 210, 526, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 239, 705, 253], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 267, 742, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 295, 911, 338], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 352, 893, 366], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 380, 712, 395], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 409, 808, 423], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 437, 811, 451], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 466, 811, 480], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 494, 880, 508], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 522, 765, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 550, 782, 565], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 579, 738, 593], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 607, 911, 650], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 664, 910, 706], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 721, 910, 763], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 777, 911, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 834, 740, 848], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 862, 867, 877], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 891, 665, 905], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 919, 634, 933], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 947, 764, 962], \"label\": \"List-item\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 116, 432, 129], \"label\": \"List-item\"}, {\"bbox_2d\": [513, 988, 522, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 115, 431, 128], \"label\": \"Text\"}, {\"bbox_2d\": [513, 989, 522, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [156, 78, 936, 110], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 116, 936, 148], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 154, 601, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 242, 936, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 280, 936, 312], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 317, 936, 349], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 355, 903, 367], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 373, 936, 405], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 411, 936, 442], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 447, 862, 459], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 465, 936, 497], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 503, 936, 531], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 537, 618, 550], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 556, 688, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 575, 643, 587], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 593, 777, 606], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 612, 936, 644], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 650, 804, 663], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 669, 857, 681], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 687, 936, 720], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 726, 492, 738], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 744, 601, 757], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 781, 508, 794], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 816, 261, 829], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 853, 936, 940], \"label\": \"Text\"}, {\"bbox_2d\": [84, 979, 182, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [928, 959, 953, 971], \"label\": \"Page-footer\"}, {\"bbox_2d\": [916, 979, 953, 995], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 189, 549, 203], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 224, 931, 238], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [154, 74, 943, 112], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 111, 941, 149], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 148, 606, 168], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 184, 554, 204], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 220, 933, 240], \"label\": \"Text\"}, {\"bbox_2d\": [154, 239, 943, 277], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 276, 945, 313], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 312, 945, 350], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 349, 909, 369], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 368, 942, 406], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 405, 942, 443], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 442, 868, 461], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 460, 946, 498], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 497, 945, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 534, 621, 554], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 553, 693, 573], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 572, 646, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 591, 782, 610], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 609, 942, 647], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 646, 806, 666], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 665, 861, 685], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 684, 943, 722], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 721, 497, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 740, 606, 759], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 775, 513, 796], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 812, 265, 832], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 848, 944, 886], \"label\": \"Text\"}, {\"bbox_2d\": [147, 884, 946, 940], \"label\": \"Text\"}, {\"bbox_2d\": [84, 974, 187, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [913, 955, 959, 1001], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [208, 125, 828, 185], \"label\": \"Picture\"}, {\"bbox_2d\": [155, 211, 848, 225], \"label\": \"Caption\"}, {\"bbox_2d\": [155, 258, 881, 554], \"label\": \"Text\"}, {\"bbox_2d\": [155, 587, 881, 728], \"label\": \"Text\"}, {\"bbox_2d\": [155, 743, 881, 851], \"label\": \"Text\"}, {\"bbox_2d\": [508, 877, 527, 891], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [207, 120, 828, 173], \"label\": \"Picture\"}, {\"bbox_2d\": [188, 212, 847, 226], \"label\": \"Caption\"}, {\"bbox_2d\": [155, 256, 880, 553], \"label\": \"Text\"}, {\"bbox_2d\": [155, 586, 880, 726], \"label\": \"Text\"}, {\"bbox_2d\": [155, 743, 880, 851], \"label\": \"Text\"}, {\"bbox_2d\": [508, 877, 527, 891], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [157, 90, 885, 147], \"label\": \"Text\"}, {\"bbox_2d\": [157, 156, 885, 196], \"label\": \"Text\"}, {\"bbox_2d\": [157, 205, 884, 262], \"label\": \"Text\"}, {\"bbox_2d\": [157, 308, 884, 365], \"label\": \"Text\"}, {\"bbox_2d\": [157, 373, 884, 399], \"label\": \"Text\"}, {\"bbox_2d\": [171, 408, 775, 421], \"label\": \"Text\"}, {\"bbox_2d\": [157, 284, 315, 296], \"label\": \"Section-header\"}, {\"bbox_2d\": [185, 428, 884, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 463, 421, 475], \"label\": \"List-item\"}, {\"bbox_2d\": [229, 483, 884, 508], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 516, 884, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [324, 608, 712, 651], \"label\": \"Section-header\"}, {\"bbox_2d\": [157, 673, 376, 686], \"label\": \"Section-header\"}, {\"bbox_2d\": [157, 762, 293, 775], \"label\": \"Section-header\"}, {\"bbox_2d\": [157, 697, 884, 740], \"label\": \"Text\"}, {\"bbox_2d\": [171, 786, 512, 799], \"label\": \"Text\"}, {\"bbox_2d\": [185, 808, 530, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 829, 884, 854], \"label\": \"List-item\"}, {\"bbox_2d\": [225, 861, 884, 887], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 896, 884, 922], \"label\": \"List-item\"}, {\"bbox_2d\": [185, 88, 884, 147], \"label\": \"Text\"}, {\"bbox_2d\": [645, 608, 884, 622], \"label\": \"Section-header\"}, {\"bbox_2d\": [324, 634, 712, 651], \"label\": \"Section-header\"}, {\"bbox_2d\": [509, 979, 526, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [156, 91, 884, 147], \"label\": \"Text\"}, {\"bbox_2d\": [156, 155, 884, 197], \"label\": \"Text\"}, {\"bbox_2d\": [156, 205, 884, 261], \"label\": \"Text\"}, {\"bbox_2d\": [156, 308, 884, 365], \"label\": \"Text\"}, {\"bbox_2d\": [156, 373, 884, 400], \"label\": \"Text\"}, {\"bbox_2d\": [171, 408, 775, 419], \"label\": \"Text\"}, {\"bbox_2d\": [757, 610, 886, 622], \"label\": \"Text\"}, {\"bbox_2d\": [156, 698, 884, 740], \"label\": \"Text\"}, {\"bbox_2d\": [171, 787, 516, 799], \"label\": \"Text\"}, {\"bbox_2d\": [434, 606, 610, 623], \"label\": \"Section-header\"}, {\"bbox_2d\": [323, 635, 719, 650], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 674, 380, 685], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 763, 297, 774], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 284, 319, 296], \"label\": \"Section-header\"}, {\"bbox_2d\": [186, 427, 884, 454], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 462, 423, 474], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 482, 884, 508], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 516, 884, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 807, 534, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 827, 884, 853], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 861, 884, 888], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 896, 884, 922], \"label\": \"List-item\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 109, 887, 249], \"label\": \"Text\"}, {\"bbox_2d\": [147, 287, 887, 911], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [148, 108, 887, 249], \"label\": \"Text\"}, {\"bbox_2d\": [148, 286, 902, 911], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 141, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [49, 57, 571, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 75, 1009, 131], \"label\": \"Text\"}, {\"bbox_2d\": [25, 137, 1009, 194], \"label\": \"Text\"}, {\"bbox_2d\": [25, 200, 1009, 257], \"label\": \"Text\"}, {\"bbox_2d\": [508, 305, 525, 315], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 140, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [49, 58, 571, 69], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 75, 1009, 131], \"label\": \"Text\"}, {\"bbox_2d\": [25, 138, 1009, 194], \"label\": \"Text\"}, {\"bbox_2d\": [25, 200, 1009, 256], \"label\": \"Text\"}, {\"bbox_2d\": [508, 305, 525, 316], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [65, 54, 248, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [82, 999, 265, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [493, 979, 513, 994], \"label\": \"Page-footer\"}, {\"bbox_2d\": [76, 95, 476, 130], \"label\": \"Text\"}, {\"bbox_2d\": [82, 135, 478, 205], \"label\": \"Text\"}, {\"bbox_2d\": [82, 209, 479, 433], \"label\": \"Text\"}, {\"bbox_2d\": [82, 437, 480, 606], \"label\": \"Text\"}, {\"bbox_2d\": [82, 611, 480, 761], \"label\": \"Text\"}, {\"bbox_2d\": [86, 771, 481, 827], \"label\": \"Text\"}, {\"bbox_2d\": [86, 831, 481, 867], \"label\": \"Text\"}, {\"bbox_2d\": [76, 872, 482, 968], \"label\": \"Text\"}, {\"bbox_2d\": [519, 733, 931, 970], \"label\": \"Text\"}, {\"bbox_2d\": [519, 507, 931, 598], \"label\": \"Text\"}, {\"bbox_2d\": [519, 303, 931, 503], \"label\": \"Text\"}, {\"bbox_2d\": [537, 93, 931, 297], \"label\": \"Text\"}, {\"bbox_2d\": [558, 602, 762, 617], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 621, 695, 635], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 639, 859, 655], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 659, 734, 674], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 677, 699, 692], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 696, 794, 711], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 715, 738, 730], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [65, 51, 264, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 90, 944, 296], \"label\": \"Text\"}, {\"bbox_2d\": [83, 91, 485, 129], \"label\": \"Text\"}, {\"bbox_2d\": [82, 129, 487, 204], \"label\": \"Text\"}, {\"bbox_2d\": [81, 204, 491, 429], \"label\": \"Text\"}, {\"bbox_2d\": [517, 297, 946, 502], \"label\": \"Text\"}, {\"bbox_2d\": [518, 503, 943, 595], \"label\": \"Text\"}, {\"bbox_2d\": [83, 429, 492, 602], \"label\": \"Text\"}, {\"bbox_2d\": [84, 604, 493, 758], \"label\": \"Text\"}, {\"bbox_2d\": [520, 728, 947, 966], \"label\": \"Text\"}, {\"bbox_2d\": [85, 866, 493, 963], \"label\": \"Text\"}, {\"bbox_2d\": [87, 825, 488, 864], \"label\": \"Text\"}, {\"bbox_2d\": [86, 765, 493, 823], \"label\": \"Text\"}, {\"bbox_2d\": [493, 973, 520, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [558, 596, 767, 614], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 615, 703, 633], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 633, 875, 651], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 653, 742, 670], \"label\": \"List-item\"}, {\"bbox_2d\": [558, 672, 711, 689], \"label\": \"List-item\"}, {\"bbox_2d\": [559, 690, 812, 708], \"label\": \"List-item\"}, {\"bbox_2d\": [559, 709, 754, 726], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [408, 123, 750, 136], \"label\": \"Page-header\"}, {\"bbox_2d\": [187, 123, 328, 136], \"label\": \"Page-header\"}, {\"bbox_2d\": [309, 260, 490, 291], \"label\": \"Title\"}, {\"bbox_2d\": [545, 295, 612, 309], \"label\": \"Title\"}, {\"bbox_2d\": [432, 320, 727, 330], \"label\": \"Text\"}, {\"bbox_2d\": [309, 343, 850, 366], \"label\": \"Text\"}, {\"bbox_2d\": [309, 374, 503, 385], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 392, 848, 418], \"label\": \"Text\"}, {\"bbox_2d\": [309, 424, 681, 435], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 489, 682, 500], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 718, 850, 782], \"label\": \"Text\"}, {\"bbox_2d\": [309, 784, 848, 822], \"label\": \"Text\"}, {\"bbox_2d\": [378, 836, 848, 860], \"label\": \"Text\"}, {\"bbox_2d\": [378, 862, 848, 898], \"label\": \"Text\"}, {\"bbox_2d\": [378, 677, 848, 715], \"label\": \"Text\"}, {\"bbox_2d\": [378, 643, 712, 652], \"label\": \"Text\"}, {\"bbox_2d\": [378, 654, 651, 663], \"label\": \"Text\"}, {\"bbox_2d\": [378, 665, 651, 674], \"label\": \"Text\"}, {\"bbox_2d\": [342, 508, 612, 520], \"label\": \"Text\"}, {\"bbox_2d\": [342, 522, 848, 573], \"label\": \"Text\"}, {\"bbox_2d\": [342, 575, 848, 622], \"label\": \"Text\"}, {\"bbox_2d\": [342, 624, 848, 650], \"label\": \"Text\"}, {\"bbox_2d\": [342, 715, 848, 741], \"label\": \"Text\"}, {\"bbox_2d\": [342, 743, 848, 782], \"label\": \"Text\"}, {\"bbox_2d\": [187, 312, 292, 338], \"label\": \"Text\"}, {\"bbox_2d\": [187, 356, 292, 409], \"label\": \"Text\"}, {\"bbox_2d\": [309, 442, 848, 479], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [407, 122, 756, 136], \"label\": \"Page-header\"}, {\"bbox_2d\": [187, 122, 334, 136], \"label\": \"Page-header\"}, {\"bbox_2d\": [309, 259, 497, 290], \"label\": \"Title\"}, {\"bbox_2d\": [546, 294, 618, 309], \"label\": \"Title\"}, {\"bbox_2d\": [200, 313, 285, 322], \"label\": \"Text\"}, {\"bbox_2d\": [209, 327, 275, 337], \"label\": \"Text\"}, {\"bbox_2d\": [187, 355, 294, 407], \"label\": \"Text\"}, {\"bbox_2d\": [187, 490, 272, 500], \"label\": \"Text\"}, {\"bbox_2d\": [431, 319, 730, 329], \"label\": \"Text\"}, {\"bbox_2d\": [309, 340, 857, 366], \"label\": \"Text\"}, {\"bbox_2d\": [309, 391, 859, 416], \"label\": \"Text\"}, {\"bbox_2d\": [309, 441, 858, 479], \"label\": \"Text\"}, {\"bbox_2d\": [343, 507, 616, 519], \"label\": \"Text\"}, {\"bbox_2d\": [343, 520, 865, 571], \"label\": \"Text\"}, {\"bbox_2d\": [343, 572, 858, 624], \"label\": \"Text\"}, {\"bbox_2d\": [343, 624, 857, 650], \"label\": \"Text\"}, {\"bbox_2d\": [411, 650, 620, 663], \"label\": \"Text\"}, {\"bbox_2d\": [411, 664, 655, 676], \"label\": \"Text\"}, {\"bbox_2d\": [377, 677, 857, 715], \"label\": \"Text\"}, {\"bbox_2d\": [309, 716, 857, 780], \"label\": \"Text\"}, {\"bbox_2d\": [309, 781, 858, 820], \"label\": \"Text\"}, {\"bbox_2d\": [377, 820, 578, 833], \"label\": \"Text\"}, {\"bbox_2d\": [377, 834, 856, 859], \"label\": \"Text\"}, {\"bbox_2d\": [377, 860, 856, 898], \"label\": \"Text\"}, {\"bbox_2d\": [309, 374, 506, 384], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 424, 683, 434], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 488, 687, 501], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [255, 93, 926, 121], \"label\": \"Text\"}, {\"bbox_2d\": [255, 537, 625, 548], \"label\": \"Caption\"}, {\"bbox_2d\": [266, 577, 915, 620], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 693, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [255, 139, 927, 534], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [255, 93, 926, 121], \"label\": \"Text\"}, {\"bbox_2d\": [266, 576, 915, 619], \"label\": \"Text\"}, {\"bbox_2d\": [255, 537, 624, 548], \"label\": \"Caption\"}, {\"bbox_2d\": [250, 140, 929, 532], \"label\": \"Picture\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 49, 294, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 128, 916, 208], \"label\": \"Text\"}, {\"bbox_2d\": [121, 210, 918, 275], \"label\": \"Text\"}, {\"bbox_2d\": [121, 277, 918, 358], \"label\": \"Text\"}, {\"bbox_2d\": [121, 360, 899, 391], \"label\": \"Text\"}, {\"bbox_2d\": [121, 392, 917, 423], \"label\": \"Text\"}, {\"bbox_2d\": [121, 425, 909, 457], \"label\": \"Text\"}, {\"bbox_2d\": [121, 460, 917, 507], \"label\": \"Text\"}, {\"bbox_2d\": [121, 509, 918, 590], \"label\": \"Text\"}, {\"bbox_2d\": [121, 592, 917, 657], \"label\": \"Text\"}, {\"bbox_2d\": [121, 659, 918, 704], \"label\": \"Text\"}, {\"bbox_2d\": [121, 706, 914, 738], \"label\": \"List-item\"}, {\"bbox_2d\": [144, 740, 917, 786], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 788, 868, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [144, 821, 917, 853], \"label\": \"List-item\"}, {\"bbox_2d\": [144, 856, 917, 921], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 974, 440, 987], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 126, 954, 209], \"label\": \"Text\"}, {\"bbox_2d\": [121, 209, 952, 275], \"label\": \"Text\"}, {\"bbox_2d\": [121, 275, 953, 358], \"label\": \"Text\"}, {\"bbox_2d\": [121, 358, 933, 391], \"label\": \"Text\"}, {\"bbox_2d\": [121, 391, 952, 424], \"label\": \"Text\"}, {\"bbox_2d\": [121, 424, 941, 457], \"label\": \"Text\"}, {\"bbox_2d\": [121, 457, 953, 507], \"label\": \"Text\"}, {\"bbox_2d\": [121, 507, 960, 589], \"label\": \"Text\"}, {\"bbox_2d\": [121, 590, 953, 655], \"label\": \"Text\"}, {\"bbox_2d\": [121, 656, 952, 705], \"label\": \"Text\"}, {\"bbox_2d\": [121, 788, 894, 821], \"label\": \"Text\"}, {\"bbox_2d\": [145, 706, 948, 738], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 739, 959, 788], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 821, 942, 854], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 855, 941, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 84, 538, 113], \"label\": \"Title\"}, {\"bbox_2d\": [87, 121, 178, 143], \"label\": \"Text\"}, {\"bbox_2d\": [87, 150, 349, 171], \"label\": \"Text\"}, {\"bbox_2d\": [87, 178, 156, 200], \"label\": \"Text\"}, {\"bbox_2d\": [87, 206, 735, 227], \"label\": \"Text\"}, {\"bbox_2d\": [87, 263, 257, 295], \"label\": \"Text\"}, {\"bbox_2d\": [87, 236, 165, 258], \"label\": \"Section-header\"}, {\"bbox_2d\": [485, 1003, 589, 1019], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 85, 535, 113], \"label\": \"Title\"}, {\"bbox_2d\": [87, 123, 178, 142], \"label\": \"Text\"}, {\"bbox_2d\": [87, 151, 347, 171], \"label\": \"Text\"}, {\"bbox_2d\": [87, 180, 156, 199], \"label\": \"Text\"}, {\"bbox_2d\": [87, 208, 729, 228], \"label\": \"Text\"}, {\"bbox_2d\": [87, 265, 505, 299], \"label\": \"Text\"}, {\"bbox_2d\": [87, 237, 160, 256], \"label\": \"Section-header\"}, {\"bbox_2d\": [485, 1003, 585, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [770, 69, 973, 97], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 113, 523, 166], \"label\": \"Text\"}, {\"bbox_2d\": [109, 189, 522, 427], \"label\": \"Text\"}, {\"bbox_2d\": [109, 432, 522, 620], \"label\": \"Text\"}, {\"bbox_2d\": [109, 625, 522, 883], \"label\": \"Text\"}, {\"bbox_2d\": [559, 507, 977, 781], \"label\": \"Text\"}, {\"bbox_2d\": [559, 787, 977, 942], \"label\": \"Text\"}, {\"bbox_2d\": [559, 948, 977, 983], \"label\": \"Text\"}, {\"bbox_2d\": [559, 171, 977, 296], \"label\": \"Text\"}, {\"bbox_2d\": [559, 301, 977, 502], \"label\": \"Text\"}, {\"bbox_2d\": [579, 113, 977, 166], \"label\": \"List-item\"}, {\"bbox_2d\": [126, 888, 467, 902], \"label\": \"List-item\"}, {\"bbox_2d\": [126, 907, 521, 962], \"label\": \"List-item\"}, {\"bbox_2d\": [126, 967, 334, 984], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 994, 555, 1007], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [770, 67, 988, 98], \"label\": \"Page-header\"}, {\"bbox_2d\": [582, 110, 980, 166], \"label\": \"Text\"}, {\"bbox_2d\": [110, 111, 529, 166], \"label\": \"Text\"}, {\"bbox_2d\": [108, 186, 537, 428], \"label\": \"Text\"}, {\"bbox_2d\": [561, 167, 989, 297], \"label\": \"Text\"}, {\"bbox_2d\": [561, 298, 988, 503], \"label\": \"Text\"}, {\"bbox_2d\": [108, 429, 533, 621], \"label\": \"Text\"}, {\"bbox_2d\": [109, 167, 519, 185], \"label\": \"Section-header\"}, {\"bbox_2d\": [561, 505, 984, 782], \"label\": \"Text\"}, {\"bbox_2d\": [107, 623, 529, 882], \"label\": \"Text\"}, {\"bbox_2d\": [560, 784, 986, 942], \"label\": \"Text\"}, {\"bbox_2d\": [564, 945, 983, 983], \"label\": \"Text\"}, {\"bbox_2d\": [128, 964, 341, 982], \"label\": \"Text\"}, {\"bbox_2d\": [128, 904, 528, 962], \"label\": \"Text\"}, {\"bbox_2d\": [127, 884, 475, 901], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 84, 604, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 115, 910, 147], \"label\": \"Text\"}, {\"bbox_2d\": [292, 150, 913, 182], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 185, 408, 201], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 217, 669, 232], \"label\": \"Text\"}, {\"bbox_2d\": [292, 572, 926, 604], \"label\": \"Text\"}, {\"bbox_2d\": [292, 619, 733, 635], \"label\": \"Text\"}, {\"bbox_2d\": [292, 650, 920, 666], \"label\": \"Text\"}, {\"bbox_2d\": [292, 818, 666, 834], \"label\": \"Text\"}, {\"bbox_2d\": [291, 687, 930, 792], \"label\": \"Table\"}, {\"bbox_2d\": [291, 675, 647, 688], \"label\": \"Caption\"}, {\"bbox_2d\": [291, 244, 930, 547], \"label\": \"Picture\"}, {\"bbox_2d\": [632, 974, 933, 990], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 89, 604, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 118, 910, 146], \"label\": \"Text\"}, {\"bbox_2d\": [293, 219, 670, 233], \"label\": \"Text\"}, {\"bbox_2d\": [293, 151, 914, 182], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 188, 408, 201], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 574, 925, 603], \"label\": \"Text\"}, {\"bbox_2d\": [293, 621, 733, 635], \"label\": \"Text\"}, {\"bbox_2d\": [293, 653, 919, 666], \"label\": \"Text\"}, {\"bbox_2d\": [293, 822, 666, 836], \"label\": \"Text\"}, {\"bbox_2d\": [292, 689, 932, 794], \"label\": \"Table\"}, {\"bbox_2d\": [293, 678, 649, 688], \"label\": \"Caption\"}, {\"bbox_2d\": [493, 244, 668, 255], \"label\": \"Section-header\"}, {\"bbox_2d\": [317, 267, 702, 526], \"label\": \"Text\"}, {\"bbox_2d\": [633, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [304, 124, 553, 136], \"label\": \"Section-header\"}, {\"bbox_2d\": [304, 348, 566, 360], \"label\": \"Section-header\"}, {\"bbox_2d\": [304, 156, 945, 281], \"label\": \"Text\"}, {\"bbox_2d\": [304, 294, 945, 328], \"label\": \"Text\"}, {\"bbox_2d\": [304, 382, 945, 438], \"label\": \"Text\"}, {\"bbox_2d\": [304, 449, 945, 530], \"label\": \"Text\"}, {\"bbox_2d\": [304, 541, 945, 622], \"label\": \"Text\"}, {\"bbox_2d\": [304, 633, 945, 691], \"label\": \"Text\"}, {\"bbox_2d\": [304, 702, 945, 827], \"label\": \"Text\"}, {\"bbox_2d\": [334, 861, 391, 873], \"label\": \"Text\"}, {\"bbox_2d\": [334, 934, 566, 959], \"label\": \"Caption\"}, {\"bbox_2d\": [628, 994, 635, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [317, 881, 608, 924], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [304, 122, 554, 136], \"label\": \"Section-header\"}, {\"bbox_2d\": [304, 347, 566, 360], \"label\": \"Section-header\"}, {\"bbox_2d\": [304, 156, 944, 282], \"label\": \"Text\"}, {\"bbox_2d\": [304, 293, 944, 328], \"label\": \"Text\"}, {\"bbox_2d\": [304, 380, 944, 438], \"label\": \"Text\"}, {\"bbox_2d\": [304, 449, 944, 530], \"label\": \"Text\"}, {\"bbox_2d\": [304, 541, 944, 621], \"label\": \"Text\"}, {\"bbox_2d\": [304, 632, 944, 690], \"label\": \"Text\"}, {\"bbox_2d\": [304, 701, 944, 827], \"label\": \"Text\"}, {\"bbox_2d\": [335, 861, 390, 873], \"label\": \"Text\"}, {\"bbox_2d\": [318, 880, 609, 918], \"label\": \"Picture\"}, {\"bbox_2d\": [335, 932, 566, 959], \"label\": \"Caption\"}, {\"bbox_2d\": [628, 989, 636, 1005], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 98, 601, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 132, 670, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 164, 914, 320], \"label\": \"Text\"}, {\"bbox_2d\": [121, 354, 914, 417], \"label\": \"Text\"}, {\"bbox_2d\": [121, 451, 914, 544], \"label\": \"Text\"}, {\"bbox_2d\": [121, 579, 914, 624], \"label\": \"Text\"}, {\"bbox_2d\": [121, 659, 914, 719], \"label\": \"Text\"}, {\"bbox_2d\": [121, 754, 914, 830], \"label\": \"Text\"}, {\"bbox_2d\": [121, 739, 283, 751], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 643, 400, 655], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 564, 306, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 435, 370, 447], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 339, 312, 352], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 941, 537, 953], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 735, 287, 752], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 640, 402, 656], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 560, 309, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 432, 373, 449], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 336, 315, 353], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 129, 674, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 93, 605, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 161, 922, 321], \"label\": \"Text\"}, {\"bbox_2d\": [121, 352, 920, 417], \"label\": \"Text\"}, {\"bbox_2d\": [121, 448, 919, 544], \"label\": \"Text\"}, {\"bbox_2d\": [121, 576, 920, 624], \"label\": \"Text\"}, {\"bbox_2d\": [121, 656, 921, 720], \"label\": \"Text\"}, {\"bbox_2d\": [121, 751, 919, 832], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [269, 110, 767, 148], \"label\": \"Title\"}, {\"bbox_2d\": [285, 164, 752, 205], \"label\": \"Text\"}, {\"bbox_2d\": [423, 191, 613, 205], \"label\": \"Text\"}, {\"bbox_2d\": [180, 214, 911, 404], \"label\": \"Text\"}, {\"bbox_2d\": [180, 407, 278, 419], \"label\": \"Text\"}, {\"bbox_2d\": [180, 480, 506, 627], \"label\": \"Text\"}, {\"bbox_2d\": [533, 444, 911, 564], \"label\": \"Text\"}, {\"bbox_2d\": [533, 567, 911, 749], \"label\": \"Text\"}, {\"bbox_2d\": [180, 633, 506, 826], \"label\": \"Text\"}, {\"bbox_2d\": [533, 751, 911, 932], \"label\": \"Text\"}, {\"bbox_2d\": [180, 829, 506, 965], \"label\": \"Text\"}, {\"bbox_2d\": [533, 936, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [31, 206, 61, 767], \"label\": \"Page-header\"}, {\"bbox_2d\": [226, 446, 402, 459], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [268, 115, 767, 147], \"label\": \"Title\"}, {\"bbox_2d\": [283, 165, 752, 191], \"label\": \"Text\"}, {\"bbox_2d\": [420, 194, 615, 205], \"label\": \"Text\"}, {\"bbox_2d\": [180, 216, 855, 392], \"label\": \"Text\"}, {\"bbox_2d\": [180, 408, 279, 416], \"label\": \"Text\"}, {\"bbox_2d\": [223, 446, 400, 456], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 481, 502, 629], \"label\": \"Text\"}, {\"bbox_2d\": [121, 634, 503, 827], \"label\": \"Text\"}, {\"bbox_2d\": [121, 832, 503, 965], \"label\": \"Text\"}, {\"bbox_2d\": [533, 937, 914, 965], \"label\": \"Text\"}, {\"bbox_2d\": [533, 753, 914, 931], \"label\": \"Text\"}, {\"bbox_2d\": [533, 569, 914, 746], \"label\": \"Text\"}, {\"bbox_2d\": [533, 444, 914, 562], \"label\": \"Text\"}, {\"bbox_2d\": [31, 207, 61, 765], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [171, 108, 848, 122], \"label\": \"Text\"}, {\"bbox_2d\": [133, 128, 874, 182], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 188, 903, 304], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 309, 888, 346], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 352, 903, 445], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 451, 869, 486], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 492, 873, 527], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 532, 903, 668], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 674, 903, 930], \"label\": \"List-item\"}, {\"bbox_2d\": [513, 958, 526, 974], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [171, 106, 855, 122], \"label\": \"Text\"}, {\"bbox_2d\": [133, 126, 880, 183], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 187, 904, 304], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 308, 887, 344], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 348, 907, 445], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 449, 870, 486], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 490, 878, 526], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 530, 902, 667], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 671, 903, 929], \"label\": \"List-item\"}, {\"bbox_2d\": [512, 958, 528, 974], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [247, 348, 770, 403], \"label\": \"Section-header\"}, {\"bbox_2d\": [295, 451, 877, 552], \"label\": \"Text\"}, {\"bbox_2d\": [295, 645, 715, 749], \"label\": \"Picture\"}, {\"bbox_2d\": [645, 796, 932, 1035], \"label\": \"Picture\"}, {\"bbox_2d\": [53, 1001, 223, 1012], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [246, 384, 395, 396], \"label\": \"Section-header\"}, {\"bbox_2d\": [296, 454, 876, 549], \"label\": \"Text\"}, {\"bbox_2d\": [291, 645, 717, 751], \"label\": \"Picture\"}, {\"bbox_2d\": [644, 797, 933, 1035], \"label\": \"Picture\"}, {\"bbox_2d\": [51, 1002, 223, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [12, 682, 22, 739], \"label\": \"Page-header\"}, {\"bbox_2d\": [407, 345, 774, 393], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [148, 108, 884, 172], \"label\": \"Text\"}, {\"bbox_2d\": [148, 209, 887, 631], \"label\": \"Text\"}, {\"bbox_2d\": [147, 669, 887, 912], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [148, 108, 886, 172], \"label\": \"Text\"}, {\"bbox_2d\": [148, 210, 888, 631], \"label\": \"Text\"}, {\"bbox_2d\": [148, 668, 887, 911], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 89, 911, 127], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 149, 911, 187], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 209, 911, 246], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 268, 712, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 302, 911, 339], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 361, 911, 399], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 421, 674, 433], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 455, 911, 492], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 513, 911, 549], \"label\": \"List-item\"}, {\"bbox_2d\": [510, 975, 529, 987], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [125, 91, 912, 128], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 150, 911, 187], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 209, 911, 246], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 268, 714, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 302, 911, 339], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 361, 911, 398], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 420, 674, 432], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 454, 911, 491], \"label\": \"List-item\"}, {\"bbox_2d\": [125, 513, 911, 550], \"label\": \"List-item\"}, {\"bbox_2d\": [509, 974, 527, 987], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 453, 113], \"label\": \"Text\"}, {\"bbox_2d\": [121, 221, 162, 236], \"label\": \"Text\"}, {\"bbox_2d\": [121, 548, 914, 623], \"label\": \"Text\"}, {\"bbox_2d\": [121, 883, 815, 900], \"label\": \"Text\"}, {\"bbox_2d\": [121, 864, 677, 878], \"label\": \"Text\"}, {\"bbox_2d\": [125, 627, 756, 856], \"label\": \"Picture\"}, {\"bbox_2d\": [321, 136, 606, 191], \"label\": \"Formula\"}, {\"bbox_2d\": [267, 262, 758, 534], \"label\": \"Formula\"}, {\"bbox_2d\": [508, 970, 527, 984], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 97, 565, 109], \"label\": \"Text\"}, {\"bbox_2d\": [121, 223, 160, 235], \"label\": \"Text\"}, {\"bbox_2d\": [121, 550, 914, 625], \"label\": \"Text\"}, {\"bbox_2d\": [121, 867, 764, 897], \"label\": \"Caption\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [263, 257, 775, 526], \"label\": \"Formula\"}, {\"bbox_2d\": [429, 145, 606, 192], \"label\": \"Formula\"}, {\"bbox_2d\": [130, 636, 756, 854], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 57, 841, 66], \"label\": \"Text\"}, {\"bbox_2d\": [25, 74, 1009, 132], \"label\": \"Text\"}, {\"bbox_2d\": [25, 455, 1009, 488], \"label\": \"Text\"}, {\"bbox_2d\": [25, 496, 1009, 540], \"label\": \"Text\"}, {\"bbox_2d\": [25, 548, 1009, 569], \"label\": \"Text\"}, {\"bbox_2d\": [25, 577, 379, 587], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 594, 1009, 628], \"label\": \"Text\"}, {\"bbox_2d\": [508, 660, 525, 670], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 146, 1000, 250], \"label\": \"Table\"}, {\"bbox_2d\": [25, 280, 1000, 441], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [25, 57, 839, 68], \"label\": \"Text\"}, {\"bbox_2d\": [25, 75, 1009, 131], \"label\": \"Text\"}, {\"bbox_2d\": [23, 140, 996, 250], \"label\": \"Table\"}, {\"bbox_2d\": [25, 279, 1000, 439], \"label\": \"Table\"}, {\"bbox_2d\": [25, 455, 1009, 489], \"label\": \"Text\"}, {\"bbox_2d\": [25, 496, 1009, 541], \"label\": \"Text\"}, {\"bbox_2d\": [25, 548, 1009, 570], \"label\": \"Text\"}, {\"bbox_2d\": [25, 595, 1009, 628], \"label\": \"Text\"}, {\"bbox_2d\": [25, 577, 380, 588], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 660, 525, 671], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 93, 605, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 127, 926, 170], \"label\": \"Text\"}, {\"bbox_2d\": [241, 195, 908, 223], \"label\": \"Text\"}, {\"bbox_2d\": [231, 250, 916, 293], \"label\": \"Text\"}, {\"bbox_2d\": [231, 654, 909, 682], \"label\": \"Text\"}, {\"bbox_2d\": [231, 698, 903, 758], \"label\": \"Text\"}, {\"bbox_2d\": [231, 774, 926, 849], \"label\": \"Text\"}, {\"bbox_2d\": [231, 866, 926, 926], \"label\": \"Text\"}, {\"bbox_2d\": [739, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 619, 540, 630], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 312, 812, 615], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [109, 93, 605, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 127, 926, 170], \"label\": \"Text\"}, {\"bbox_2d\": [241, 195, 909, 223], \"label\": \"Text\"}, {\"bbox_2d\": [231, 249, 916, 292], \"label\": \"Text\"}, {\"bbox_2d\": [231, 654, 907, 681], \"label\": \"Text\"}, {\"bbox_2d\": [231, 698, 902, 757], \"label\": \"Text\"}, {\"bbox_2d\": [231, 774, 926, 849], \"label\": \"Text\"}, {\"bbox_2d\": [231, 865, 925, 925], \"label\": \"Text\"}, {\"bbox_2d\": [231, 620, 540, 630], \"label\": \"Caption\"}, {\"bbox_2d\": [230, 309, 811, 614], \"label\": \"Picture\"}, {\"bbox_2d\": [738, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [812, 33, 989, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [741, 868, 954, 966], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [814, 35, 988, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [737, 868, 955, 967], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [130, 74, 893, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [506, 291, 591, 305], \"label\": \"Caption\"}, {\"bbox_2d\": [210, 404, 819, 702], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [131, 73, 406, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [431, 73, 696, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [713, 73, 897, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [506, 291, 594, 305], \"label\": \"Caption\"}, {\"bbox_2d\": [209, 397, 820, 776], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [130, 18, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 100, 262, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 117, 292, 127], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 132, 323, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 417, 434, 428], \"label\": \"Section-header\"}, {\"bbox_2d\": [131, 148, 481, 277], \"label\": \"Text\"}, {\"bbox_2d\": [131, 282, 466, 412], \"label\": \"Text\"}, {\"bbox_2d\": [152, 449, 481, 744], \"label\": \"Text\"}, {\"bbox_2d\": [131, 774, 481, 958], \"label\": \"Text\"}, {\"bbox_2d\": [58, 1001, 170, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [566, 101, 725, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [587, 117, 913, 170], \"label\": \"Text\"}, {\"bbox_2d\": [587, 175, 913, 277], \"label\": \"Text\"}, {\"bbox_2d\": [587, 282, 913, 351], \"label\": \"Text\"}, {\"bbox_2d\": [545, 356, 886, 562], \"label\": \"Text\"}, {\"bbox_2d\": [587, 582, 906, 655], \"label\": \"Text\"}, {\"bbox_2d\": [587, 670, 895, 729], \"label\": \"Text\"}, {\"bbox_2d\": [587, 567, 742, 578], \"label\": \"Section-header\"}, {\"bbox_2d\": [587, 659, 786, 670], \"label\": \"Section-header\"}, {\"bbox_2d\": [587, 733, 913, 939], \"label\": \"Text\"}, {\"bbox_2d\": [131, 751, 266, 762], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 101, 262, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [58, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [587, 734, 918, 895], \"label\": \"Text\"}, {\"bbox_2d\": [587, 900, 902, 940], \"label\": \"Text\"}, {\"bbox_2d\": [154, 768, 485, 959], \"label\": \"Text\"}, {\"bbox_2d\": [587, 584, 912, 655], \"label\": \"Text\"}, {\"bbox_2d\": [132, 148, 483, 278], \"label\": \"Text\"}, {\"bbox_2d\": [587, 116, 918, 172], \"label\": \"Text\"}, {\"bbox_2d\": [587, 177, 914, 277], \"label\": \"Text\"}, {\"bbox_2d\": [587, 282, 919, 352], \"label\": \"Text\"}, {\"bbox_2d\": [566, 372, 916, 562], \"label\": \"Text\"}, {\"bbox_2d\": [587, 674, 906, 730], \"label\": \"Text\"}, {\"bbox_2d\": [154, 450, 477, 581], \"label\": \"Text\"}, {\"bbox_2d\": [154, 585, 485, 746], \"label\": \"Text\"}, {\"bbox_2d\": [132, 283, 475, 413], \"label\": \"Text\"}, {\"bbox_2d\": [113, 418, 435, 428], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 435, 243, 446], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 357, 887, 367], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 753, 266, 764], \"label\": \"List-item\"}, {\"bbox_2d\": [587, 659, 786, 670], \"label\": \"List-item\"}, {\"bbox_2d\": [566, 569, 742, 580], \"label\": \"List-item\"}, {\"bbox_2d\": [566, 101, 725, 112], \"label\": \"List-item\"}, {\"bbox_2d\": [113, 133, 324, 143], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 118, 292, 128], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [73, 146, 695, 201], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 215, 235, 364], \"label\": \"Text\"}, {\"bbox_2d\": [77, 268, 146, 280], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 309, 119, 321], \"label\": \"Section-header\"}, {\"bbox_2d\": [388, 215, 597, 293], \"label\": \"Text\"}, {\"bbox_2d\": [388, 270, 525, 292], \"label\": \"Text\"}, {\"bbox_2d\": [388, 300, 523, 319], \"label\": \"Text\"}, {\"bbox_2d\": [388, 327, 507, 346], \"label\": \"Text\"}, {\"bbox_2d\": [388, 354, 505, 373], \"label\": \"Text\"}, {\"bbox_2d\": [388, 381, 580, 401], \"label\": \"Text\"}, {\"bbox_2d\": [388, 409, 580, 430], \"label\": \"Text\"}, {\"bbox_2d\": [388, 438, 508, 437], \"label\": \"Text\"}, {\"bbox_2d\": [77, 819, 257, 844], \"label\": \"Caption\"}, {\"bbox_2d\": [696, 215, 762, 227], \"label\": \"Section-header\"}, {\"bbox_2d\": [77, 336, 235, 361], \"label\": \"Text\"}, {\"bbox_2d\": [388, 354, 507, 373], \"label\": \"Text\"}, {\"bbox_2d\": [388, 381, 580, 399], \"label\": \"Text\"}, {\"bbox_2d\": [77, 231, 235, 262], \"label\": \"Text\"}, {\"bbox_2d\": [77, 289, 235, 300], \"label\": \"Text\"}, {\"bbox_2d\": [77, 313, 235, 331], \"label\": \"Text\"}, {\"bbox_2d\": [77, 346, 235, 357], \"label\": \"Text\"}, {\"bbox_2d\": [388, 409, 508, 430], \"label\": \"Text\"}, {\"bbox_2d\": [77, 374, 235, 391], \"label\": \"Text\"}, {\"bbox_2d\": [77, 405, 235, 416], \"label\": \"Text\"}, {\"bbox_2d\": [388, 399, 580, 408], \"label\": \"Text\"}, {\"bbox_2d\": [388, 416, 508, 429], \"label\": \"Text\"}, {\"bbox_2d\": [696, 231, 829, 262], \"label\": \"Text\"}, {\"bbox_2d\": [696, 270, 877, 293], \"label\": \"Text\"}, {\"bbox_2d\": [696, 300, 846, 319], \"label\": \"Text\"}, {\"bbox_2d\": [696, 327, 877, 346], \"label\": \"Text\"}, {\"bbox_2d\": [696, 354, 877, 373], \"label\": \"Text\"}, {\"bbox_2d\": [696, 381, 801, 399], \"label\": \"Text\"}, {\"bbox_2d\": [696, 409, 826, 429], \"label\": \"Text\"}, {\"bbox_2d\": [696, 437, 877, 445], \"label\": \"Text\"}, {\"bbox_2d\": [77, 871, 936, 914], \"label\": \"Section-header\"}, {\"bbox_2d\": [696, 857, 936, 941], \"label\": \"Section-header\"}, {\"bbox_2d\": [1005, 1012, 1019, 1021], \"label\": \"Page-footer\"}, {\"bbox_2d\": [275, 699, 871, 852], \"label\": \"Table\"}, {\"bbox_2d\": [126, 697, 257, 808], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [77, 146, 687, 201], \"label\": \"Section-header\"}, {\"bbox_2d\": [265, 697, 872, 851], \"label\": \"Table\"}, {\"bbox_2d\": [789, 852, 936, 962], \"label\": \"Section-header\"}, {\"bbox_2d\": [276, 867, 734, 895], \"label\": \"Section-header\"}, {\"bbox_2d\": [1006, 1013, 1021, 1023], \"label\": \"Page-footer\"}, {\"bbox_2d\": [62, 808, 255, 842], \"label\": \"Text\"}, {\"bbox_2d\": [126, 695, 258, 806], \"label\": \"Picture\"}, {\"bbox_2d\": [79, 214, 233, 264], \"label\": \"Text\"}, {\"bbox_2d\": [79, 266, 233, 360], \"label\": \"Text\"}, {\"bbox_2d\": [388, 214, 603, 435], \"label\": \"Text\"}, {\"bbox_2d\": [697, 214, 913, 447], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 83, 406, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 102, 616, 117], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 123, 448, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 155, 801, 167], \"label\": \"Text\"}, {\"bbox_2d\": [274, 186, 516, 199], \"label\": \"Text\"}, {\"bbox_2d\": [274, 529, 863, 559], \"label\": \"Text\"}, {\"bbox_2d\": [274, 588, 826, 602], \"label\": \"Text\"}, {\"bbox_2d\": [274, 617, 406, 632], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 571, 578, 587], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 638, 907, 668], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 674, 433, 688], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 706, 484, 719], \"label\": \"Text\"}, {\"bbox_2d\": [274, 207, 824, 482], \"label\": \"Picture\"}, {\"bbox_2d\": [274, 446, 736, 482], \"label\": \"Table\"}, {\"bbox_2d\": [99, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 85, 406, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 102, 617, 118], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 124, 449, 137], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 155, 802, 169], \"label\": \"Text\"}, {\"bbox_2d\": [273, 187, 514, 200], \"label\": \"Text\"}, {\"bbox_2d\": [273, 528, 863, 559], \"label\": \"Text\"}, {\"bbox_2d\": [273, 589, 826, 602], \"label\": \"Text\"}, {\"bbox_2d\": [273, 706, 485, 719], \"label\": \"Text\"}, {\"bbox_2d\": [273, 573, 577, 587], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 620, 406, 633], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 640, 907, 668], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 675, 434, 688], \"label\": \"List-item\"}, {\"bbox_2d\": [270, 202, 915, 503], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [732, 49, 936, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [161, 91, 804, 942], \"label\": \"Picture\"}, {\"bbox_2d\": [344, 949, 688, 966], \"label\": \"Caption\"}, {\"bbox_2d\": [502, 988, 532, 1001], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [154, 87, 840, 942], \"label\": \"Picture\"}, {\"bbox_2d\": [342, 947, 701, 963], \"label\": \"Caption\"}, {\"bbox_2d\": [732, 48, 948, 79], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [78, 43, 696, 74], \"label\": \"Page-header\"}, {\"bbox_2d\": [105, 99, 358, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [105, 129, 884, 170], \"label\": \"Text\"}, {\"bbox_2d\": [105, 188, 577, 199], \"label\": \"Text\"}, {\"bbox_2d\": [105, 551, 914, 682], \"label\": \"Text\"}, {\"bbox_2d\": [105, 702, 907, 788], \"label\": \"Text\"}, {\"bbox_2d\": [105, 808, 910, 849], \"label\": \"Text\"}, {\"bbox_2d\": [105, 868, 888, 894], \"label\": \"Text\"}, {\"bbox_2d\": [62, 994, 438, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [128, 199, 853, 520], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [77, 43, 695, 74], \"label\": \"Page-header\"}, {\"bbox_2d\": [105, 99, 358, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [65, 991, 437, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [105, 867, 888, 894], \"label\": \"Text\"}, {\"bbox_2d\": [105, 807, 911, 849], \"label\": \"Text\"}, {\"bbox_2d\": [105, 701, 933, 788], \"label\": \"Text\"}, {\"bbox_2d\": [105, 551, 915, 683], \"label\": \"Text\"}, {\"bbox_2d\": [105, 188, 577, 200], \"label\": \"Text\"}, {\"bbox_2d\": [105, 128, 885, 170], \"label\": \"Text\"}, {\"bbox_2d\": [121, 202, 852, 517], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 149, 519, 332], \"label\": \"Table\"}, {\"bbox_2d\": [121, 402, 520, 483], \"label\": \"Table\"}, {\"bbox_2d\": [121, 506, 521, 560], \"label\": \"Table\"}, {\"bbox_2d\": [121, 591, 521, 766], \"label\": \"Table\"}, {\"bbox_2d\": [121, 820, 521, 968], \"label\": \"Table\"}, {\"bbox_2d\": [121, 974, 497, 990], \"label\": \"Text\"}, {\"bbox_2d\": [121, 337, 520, 392], \"label\": \"Text\"}, {\"bbox_2d\": [121, 486, 273, 496], \"label\": \"Text\"}, {\"bbox_2d\": [121, 767, 500, 789], \"label\": \"Text\"}, {\"bbox_2d\": [121, 790, 508, 813], \"label\": \"Text\"}, {\"bbox_2d\": [121, 821, 519, 899], \"label\": \"Table\"}, {\"bbox_2d\": [121, 974, 496, 990], \"label\": \"Text\"}, {\"bbox_2d\": [563, 753, 962, 809], \"label\": \"Text\"}, {\"bbox_2d\": [563, 818, 963, 893], \"label\": \"Table\"}, {\"bbox_2d\": [563, 898, 876, 911], \"label\": \"Text\"}, {\"bbox_2d\": [563, 564, 963, 657], \"label\": \"Table\"}, {\"bbox_2d\": [563, 563, 753, 577], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 578, 189, 592], \"label\": \"Section-header\"}, {\"bbox_2d\": [563, 411, 963, 488], \"label\": \"Table\"}, {\"bbox_2d\": [563, 163, 963, 332], \"label\": \"Table\"}, {\"bbox_2d\": [563, 149, 849, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 149, 292, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [563, 338, 962, 403], \"label\": \"Text\"}, {\"bbox_2d\": [563, 479, 963, 543], \"label\": \"Table\"}, {\"bbox_2d\": [563, 638, 962, 750], \"label\": \"Table\"}, {\"bbox_2d\": [563, 922, 963, 975], \"label\": \"Table\"}, {\"bbox_2d\": [563, 898, 876, 911], \"label\": \"Text\"}, {\"bbox_2d\": [862, 999, 976, 1014], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [562, 160, 965, 331], \"label\": \"Table\"}, {\"bbox_2d\": [118, 160, 521, 331], \"label\": \"Table\"}, {\"bbox_2d\": [118, 400, 521, 482], \"label\": \"Table\"}, {\"bbox_2d\": [117, 503, 521, 560], \"label\": \"Table\"}, {\"bbox_2d\": [118, 592, 522, 764], \"label\": \"Table\"}, {\"bbox_2d\": [561, 576, 966, 751], \"label\": \"Table\"}, {\"bbox_2d\": [563, 501, 966, 545], \"label\": \"Table\"}, {\"bbox_2d\": [562, 410, 963, 493], \"label\": \"Table\"}, {\"bbox_2d\": [561, 816, 965, 897], \"label\": \"Table\"}, {\"bbox_2d\": [562, 919, 966, 975], \"label\": \"Table\"}, {\"bbox_2d\": [118, 819, 521, 902], \"label\": \"Table\"}, {\"bbox_2d\": [118, 910, 523, 969], \"label\": \"Table\"}, {\"bbox_2d\": [563, 335, 961, 403], \"label\": \"Text\"}, {\"bbox_2d\": [120, 335, 522, 391], \"label\": \"Text\"}, {\"bbox_2d\": [120, 485, 273, 494], \"label\": \"Text\"}, {\"bbox_2d\": [120, 768, 500, 789], \"label\": \"Text\"}, {\"bbox_2d\": [120, 791, 509, 813], \"label\": \"Text\"}, {\"bbox_2d\": [563, 752, 966, 809], \"label\": \"Text\"}, {\"bbox_2d\": [563, 901, 875, 910], \"label\": \"Text\"}, {\"bbox_2d\": [120, 971, 499, 992], \"label\": \"Text\"}, {\"bbox_2d\": [863, 1001, 977, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [563, 566, 754, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 582, 189, 592], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 149, 293, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [563, 149, 848, 159], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 32, 293, 125], \"label\": \"Picture\"}, {\"bbox_2d\": [460, 74, 623, 85], \"label\": \"Text\"}, {\"bbox_2d\": [736, 61, 902, 72], \"label\": \"Page-header\"}, {\"bbox_2d\": [180, 134, 902, 172], \"label\": \"Text\"}, {\"bbox_2d\": [179, 192, 902, 246], \"label\": \"Text\"}, {\"bbox_2d\": [179, 299, 869, 311], \"label\": \"Text\"}, {\"bbox_2d\": [223, 331, 376, 342], \"label\": \"Text\"}, {\"bbox_2d\": [223, 362, 565, 458], \"label\": \"Text\"}, {\"bbox_2d\": [223, 477, 577, 489], \"label\": \"Text\"}, {\"bbox_2d\": [223, 509, 370, 577], \"label\": \"Text\"}, {\"bbox_2d\": [179, 631, 634, 643], \"label\": \"Text\"}, {\"bbox_2d\": [179, 694, 664, 706], \"label\": \"Text\"}, {\"bbox_2d\": [179, 758, 477, 783], \"label\": \"Text\"}, {\"bbox_2d\": [188, 831, 508, 856], \"label\": \"Text\"}, {\"bbox_2d\": [188, 876, 345, 902], \"label\": \"Text\"}, {\"bbox_2d\": [188, 803, 269, 814], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 725, 245, 737], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 662, 561, 675], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 597, 527, 609], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 266, 507, 278], \"label\": \"Section-header\"}, {\"bbox_2d\": [180, 134, 201, 145], \"label\": \"List-item\"}, {\"bbox_2d\": [557, 831, 761, 856], \"label\": \"Text\"}, {\"bbox_2d\": [557, 873, 714, 902], \"label\": \"Text\"}, {\"bbox_2d\": [557, 873, 627, 884], \"label\": \"Text\"}, {\"bbox_2d\": [179, 922, 338, 934], \"label\": \"Text\"}, {\"bbox_2d\": [886, 975, 899, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [458, 72, 627, 86], \"label\": \"Text\"}, {\"bbox_2d\": [179, 190, 906, 247], \"label\": \"Text\"}, {\"bbox_2d\": [179, 297, 874, 311], \"label\": \"Text\"}, {\"bbox_2d\": [179, 755, 482, 783], \"label\": \"Text\"}, {\"bbox_2d\": [179, 131, 907, 173], \"label\": \"List-item\"}, {\"bbox_2d\": [224, 328, 380, 342], \"label\": \"Text\"}, {\"bbox_2d\": [224, 360, 569, 458], \"label\": \"Text\"}, {\"bbox_2d\": [224, 476, 581, 490], \"label\": \"Text\"}, {\"bbox_2d\": [224, 507, 374, 578], \"label\": \"Text\"}, {\"bbox_2d\": [179, 628, 639, 642], \"label\": \"Text\"}, {\"bbox_2d\": [179, 692, 669, 706], \"label\": \"Text\"}, {\"bbox_2d\": [188, 829, 511, 857], \"label\": \"Text\"}, {\"bbox_2d\": [188, 875, 348, 902], \"label\": \"Text\"}, {\"bbox_2d\": [179, 920, 343, 934], \"label\": \"Text\"}, {\"bbox_2d\": [557, 829, 765, 857], \"label\": \"Text\"}, {\"bbox_2d\": [557, 871, 718, 902], \"label\": \"Text\"}, {\"bbox_2d\": [886, 973, 900, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [188, 801, 274, 814], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 724, 250, 738], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 660, 566, 675], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 596, 533, 611], \"label\": \"Section-header\"}, {\"bbox_2d\": [179, 264, 512, 279], \"label\": \"Section-header\"}, {\"bbox_2d\": [735, 60, 905, 71], \"label\": \"Page-header\"}, {\"bbox_2d\": [112, 34, 290, 122], \"label\": \"Picture\"}, {\"bbox_2d\": [829, 83, 902, 121], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [893, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 96, 911, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 153, 911, 195], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 210, 730, 224], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 239, 802, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 267, 841, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 295, 911, 337], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 352, 860, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 379, 905, 392], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 407, 737, 420], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 435, 911, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 493, 911, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 549, 662, 563], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 578, 911, 619], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 634, 911, 677], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 692, 815, 706], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 721, 559, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 748, 857, 761], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 776, 872, 790], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 805, 718, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 834, 802, 847], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 862, 911, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 919, 761, 933], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 948, 911, 961], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [891, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 96, 910, 139], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 153, 910, 195], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 209, 733, 224], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 238, 805, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 266, 844, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 294, 910, 337], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 351, 853, 365], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 379, 911, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 408, 738, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 436, 910, 479], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 493, 910, 536], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 549, 663, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 578, 910, 620], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 635, 910, 677], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 691, 816, 706], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 720, 560, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 748, 858, 762], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 776, 870, 791], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 805, 717, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 833, 801, 847], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 861, 910, 904], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 918, 761, 932], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 946, 910, 961], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [698, 92, 723, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 328, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [677, 152, 895, 160], \"label\": \"Text\"}, {\"bbox_2d\": [282, 123, 351, 133], \"label\": \"Text\"}, {\"bbox_2d\": [157, 139, 353, 241], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 268, 354, 368], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 395, 355, 494], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 521, 355, 633], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 660, 354, 750], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 777, 355, 857], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 868, 354, 967], \"label\": \"Picture\"}, {\"bbox_2d\": [467, 142, 500, 150], \"label\": \"Text\"}, {\"bbox_2d\": [466, 265, 500, 273], \"label\": \"Text\"}, {\"bbox_2d\": [467, 390, 500, 398], \"label\": \"Text\"}, {\"bbox_2d\": [466, 517, 500, 525], \"label\": \"Text\"}, {\"bbox_2d\": [466, 642, 500, 650], \"label\": \"Text\"}, {\"bbox_2d\": [467, 753, 500, 761], \"label\": \"Text\"}, {\"bbox_2d\": [467, 864, 500, 872], \"label\": \"Text\"}, {\"bbox_2d\": [677, 807, 895, 815], \"label\": \"Text\"}, {\"bbox_2d\": [677, 824, 709, 832], \"label\": \"Text\"}, {\"bbox_2d\": [677, 499, 709, 507], \"label\": \"Text\"}, {\"bbox_2d\": [677, 577, 709, 585], \"label\": \"Text\"}, {\"bbox_2d\": [677, 681, 709, 689], \"label\": \"Text\"}, {\"bbox_2d\": [677, 769, 709, 777], \"label\": \"Text\"}, {\"bbox_2d\": [677, 710, 709, 718], \"label\": \"Text\"}, {\"bbox_2d\": [677, 835, 709, 843], \"label\": \"Text\"}, {\"bbox_2d\": [677, 615, 709, 623], \"label\": \"Text\"}, {\"bbox_2d\": [677, 452, 709, 460], \"label\": \"Text\"}, {\"bbox_2d\": [677, 367, 709, 375], \"label\": \"Text\"}, {\"bbox_2d\": [677, 263, 709, 271], \"label\": \"Text\"}, {\"bbox_2d\": [677, 298, 709, 306], \"label\": \"Text\"}, {\"bbox_2d\": [677, 398, 709, 406], \"label\": \"Text\"}, {\"bbox_2d\": [677, 152, 895, 256], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 258, 609, 272], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 275, 609, 283], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 286, 609, 294], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 310, 609, 321], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 323, 609, 331], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 398, 609, 406], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 460, 609, 468], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 495, 609, 503], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 529, 609, 537], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 564, 609, 572], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 649, 609, 657], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 674, 609, 682], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 757, 609, 765], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 782, 609, 790], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 828, 609, 836], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 863, 609, 871], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 898, 609, 906], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 933, 609, 941], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 967, 609, 975], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 907, 609, 915], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 941, 609, 950], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 976, 609, 984], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 701, 609, 709], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 741, 609, 749], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 686, 609, 694], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 710, 609, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 622, 609, 630], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 525, 609, 533], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 559, 609, 567], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 495, 609, 503], \"", "label": "```json[{\"bbox_2d\": [698, 93, 721, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [440, 74, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 325, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [553, 136, 895, 261], \"label\": \"Picture\"}, {\"bbox_2d\": [552, 262, 895, 389], \"label\": \"Picture\"}, {\"bbox_2d\": [552, 393, 895, 521], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 524, 895, 653], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 655, 895, 797], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 804, 896, 971], \"label\": \"Picture\"}, {\"bbox_2d\": [157, 137, 499, 243], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 263, 500, 368], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 388, 499, 493], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 515, 499, 638], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 644, 501, 748], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 752, 499, 857], \"label\": \"Picture\"}, {\"bbox_2d\": [157, 862, 500, 966], \"label\": \"Picture\"}, {\"bbox_2d\": [677, 125, 745, 135], \"label\": \"Text\"}, {\"bbox_2d\": [282, 125, 351, 135], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 43, 346, 56], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 242, 218, 256], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 453, 212, 466], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 528, 215, 541], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 350, 958, 378], \"label\": \"Text\"}, {\"bbox_2d\": [85, 393, 937, 438], \"label\": \"Text\"}, {\"bbox_2d\": [85, 469, 952, 514], \"label\": \"Text\"}, {\"bbox_2d\": [85, 544, 942, 572], \"label\": \"Text\"}, {\"bbox_2d\": [85, 588, 935, 616], \"label\": \"Text\"}, {\"bbox_2d\": [85, 631, 557, 644], \"label\": \"Text\"}, {\"bbox_2d\": [85, 852, 935, 880], \"label\": \"Text\"}, {\"bbox_2d\": [85, 899, 958, 940], \"label\": \"Text\"}, {\"bbox_2d\": [801, 999, 965, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 826, 680, 838], \"label\": \"Text\"}, {\"bbox_2d\": [85, 57, 949, 105], \"label\": \"Text\"}, {\"bbox_2d\": [85, 119, 944, 227], \"label\": \"Text\"}, {\"bbox_2d\": [85, 259, 958, 334], \"label\": \"Text\"}, {\"bbox_2d\": [85, 657, 963, 822], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [85, 39, 346, 58], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 238, 218, 257], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 58, 953, 105], \"label\": \"Text\"}, {\"bbox_2d\": [85, 117, 951, 228], \"label\": \"Text\"}, {\"bbox_2d\": [85, 256, 961, 335], \"label\": \"Text\"}, {\"bbox_2d\": [85, 347, 963, 378], \"label\": \"Text\"}, {\"bbox_2d\": [85, 391, 942, 438], \"label\": \"Text\"}, {\"bbox_2d\": [85, 466, 957, 513], \"label\": \"Text\"}, {\"bbox_2d\": [85, 526, 215, 541], \"label\": \"Text\"}, {\"bbox_2d\": [85, 542, 947, 573], \"label\": \"Text\"}, {\"bbox_2d\": [85, 585, 941, 617], \"label\": \"Text\"}, {\"bbox_2d\": [85, 629, 557, 644], \"label\": \"Text\"}, {\"bbox_2d\": [85, 448, 213, 467], \"label\": \"Section-header\"}, {\"bbox_2d\": [84, 653, 969, 822], \"label\": \"Table\"}, {\"bbox_2d\": [85, 824, 682, 836], \"label\": \"Text\"}, {\"bbox_2d\": [85, 850, 940, 881], \"label\": \"Text\"}, {\"bbox_2d\": [85, 893, 963, 940], \"label\": \"Text\"}, {\"bbox_2d\": [802, 998, 965, 1009], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 273, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 158, 911, 311], \"label\": \"Text\"}, {\"bbox_2d\": [121, 444, 803, 460], \"label\": \"Text\"}, {\"bbox_2d\": [121, 531, 911, 602], \"label\": \"Text\"}, {\"bbox_2d\": [121, 611, 911, 792], \"label\": \"Text\"}, {\"bbox_2d\": [121, 863, 911, 961], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [372, 807, 911, 846], \"label\": \"Formula\"}, {\"bbox_2d\": [431, 482, 911, 504], \"label\": \"Formula\"}, {\"bbox_2d\": [356, 333, 911, 360], \"label\": \"Formula\"}, {\"bbox_2d\": [356, 367, 911, 394], \"label\": \"Formula\"}, {\"bbox_2d\": [356, 397, 911, 424], \"label\": \"Formula\"}, {\"bbox_2d\": [400, 119, 911, 137], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 274, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 156, 911, 311], \"label\": \"Text\"}, {\"bbox_2d\": [121, 444, 803, 459], \"label\": \"Text\"}, {\"bbox_2d\": [121, 530, 911, 600], \"label\": \"Text\"}, {\"bbox_2d\": [121, 612, 911, 794], \"label\": \"Text\"}, {\"bbox_2d\": [121, 862, 911, 960], \"label\": \"Text\"}, {\"bbox_2d\": [374, 808, 911, 848], \"label\": \"Formula\"}, {\"bbox_2d\": [431, 474, 911, 532], \"label\": \"Formula\"}, {\"bbox_2d\": [357, 398, 911, 427], \"label\": \"Formula\"}, {\"bbox_2d\": [357, 367, 911, 396], \"label\": \"Formula\"}, {\"bbox_2d\": [358, 335, 911, 356], \"label\": \"Formula\"}, {\"bbox_2d\": [398, 120, 911, 149], \"label\": \"Formula\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [900, 64, 914, 76], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 100, 914, 176], \"label\": \"Text\"}, {\"bbox_2d\": [121, 214, 332, 229], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 251, 914, 476], \"label\": \"Text\"}, {\"bbox_2d\": [121, 573, 914, 678], \"label\": \"Text\"}, {\"bbox_2d\": [121, 695, 914, 858], \"label\": \"Text\"}, {\"bbox_2d\": [121, 874, 914, 920], \"label\": \"Text\"}, {\"bbox_2d\": [869, 505, 914, 523], \"label\": \"Formula\"}, {\"bbox_2d\": [359, 468, 679, 518], \"label\": \"Formula\"}, {\"bbox_2d\": [360, 536, 614, 551], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 957, 210, 968], \"label\": \"Page-footer\"}, {\"bbox_2d\": [869, 957, 914, 968], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [900, 63, 914, 72], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 101, 914, 175], \"label\": \"Text\"}, {\"bbox_2d\": [121, 250, 925, 476], \"label\": \"Text\"}, {\"bbox_2d\": [121, 216, 332, 230], \"label\": \"Section-header\"}, {\"bbox_2d\": [360, 473, 914, 552], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 573, 914, 678], \"label\": \"Text\"}, {\"bbox_2d\": [121, 694, 925, 859], \"label\": \"Text\"}, {\"bbox_2d\": [121, 875, 914, 919], \"label\": \"Text\"}, {\"bbox_2d\": [869, 959, 914, 968], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 959, 210, 968], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [208, 71, 701, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [208, 93, 914, 167], \"label\": \"Text\"}, {\"bbox_2d\": [146, 967, 292, 990], \"label\": \"Page-footer\"}, {\"bbox_2d\": [829, 967, 902, 979], \"label\": \"Page-footer\"}, {\"bbox_2d\": [144, 176, 1034, 440], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [209, 69, 706, 84], \"label\": \"Text\"}, {\"bbox_2d\": [209, 92, 917, 170], \"label\": \"Text\"}, {\"bbox_2d\": [143, 176, 1035, 439], \"label\": \"Table\"}, {\"bbox_2d\": [148, 966, 297, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [829, 966, 904, 978], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [680, 44, 937, 59], \"label\": \"Page-header\"}, {\"bbox_2d\": [356, 96, 678, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 135, 877, 268], \"label\": \"Table\"}, {\"bbox_2d\": [98, 289, 937, 357], \"label\": \"Text\"}, {\"bbox_2d\": [98, 398, 937, 446], \"label\": \"Text\"}, {\"bbox_2d\": [98, 505, 937, 554], \"label\": \"Text\"}, {\"bbox_2d\": [98, 578, 937, 645], \"label\": \"Text\"}, {\"bbox_2d\": [98, 669, 801, 680], \"label\": \"Text\"}, {\"bbox_2d\": [98, 686, 491, 697], \"label\": \"Text\"}, {\"bbox_2d\": [98, 703, 724, 717], \"label\": \"Text\"}, {\"bbox_2d\": [98, 979, 182, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [911, 959, 952, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 362, 221, 375], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 453, 264, 465], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 489, 236, 500], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 721, 964, 815], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [679, 40, 942, 60], \"label\": \"Page-header\"}, {\"bbox_2d\": [356, 90, 686, 117], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 285, 945, 358], \"label\": \"Text\"}, {\"bbox_2d\": [98, 393, 945, 449], \"label\": \"Text\"}, {\"bbox_2d\": [98, 501, 947, 557], \"label\": \"Text\"}, {\"bbox_2d\": [98, 573, 946, 647], \"label\": \"Text\"}, {\"bbox_2d\": [98, 663, 799, 683], \"label\": \"Text\"}, {\"bbox_2d\": [98, 681, 494, 702], \"label\": \"Text\"}, {\"bbox_2d\": [98, 131, 880, 268], \"label\": \"Table\"}, {\"bbox_2d\": [98, 357, 226, 377], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 447, 268, 467], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 483, 241, 503], \"label\": \"Section-header\"}, {\"bbox_2d\": [96, 716, 963, 813], \"label\": \"Table\"}, {\"bbox_2d\": [98, 699, 727, 720], \"label\": \"Text\"}, {\"bbox_2d\": [930, 955, 957, 975], \"label\": \"Page-footer\"}, {\"bbox_2d\": [913, 971, 959, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [84, 974, 187, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [336, 81, 812, 129], \"label\": \"Text\"}, {\"bbox_2d\": [333, 162, 896, 192], \"label\": \"Text\"}, {\"bbox_2d\": [333, 200, 517, 215], \"label\": \"Text\"}, {\"bbox_2d\": [333, 223, 854, 254], \"label\": \"Text\"}, {\"bbox_2d\": [333, 262, 893, 293], \"label\": \"Text\"}, {\"bbox_2d\": [274, 344, 910, 388], \"label\": \"Text\"}, {\"bbox_2d\": [274, 407, 871, 421], \"label\": \"Text\"}, {\"bbox_2d\": [101, 314, 858, 333], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [334, 85, 419, 98], \"label\": \"Text\"}, {\"bbox_2d\": [395, 101, 811, 129], \"label\": \"Text\"}, {\"bbox_2d\": [334, 163, 895, 192], \"label\": \"Text\"}, {\"bbox_2d\": [334, 202, 516, 216], \"label\": \"Text\"}, {\"bbox_2d\": [273, 140, 430, 153], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 226, 854, 255], \"label\": \"List-item\"}, {\"bbox_2d\": [334, 265, 893, 294], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 316, 859, 333], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 345, 908, 389], \"label\": \"Text\"}, {\"bbox_2d\": [273, 407, 870, 421], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [526, 38, 546, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [114, 76, 343, 92], \"label\": \"Text\"}, {\"bbox_2d\": [114, 96, 960, 133], \"label\": \"Text\"}, {\"bbox_2d\": [114, 137, 960, 190], \"label\": \"Text\"}, {\"bbox_2d\": [114, 194, 960, 231], \"label\": \"Text\"}, {\"bbox_2d\": [114, 234, 960, 271], \"label\": \"Text\"}, {\"bbox_2d\": [114, 274, 960, 307], \"label\": \"Text\"}, {\"bbox_2d\": [114, 311, 960, 368], \"label\": \"Text\"}, {\"bbox_2d\": [114, 371, 960, 408], \"label\": \"Text\"}, {\"bbox_2d\": [114, 411, 960, 468], \"label\": \"Text\"}, {\"bbox_2d\": [114, 472, 960, 509], \"label\": \"Text\"}, {\"bbox_2d\": [114, 512, 960, 545], \"label\": \"Text\"}, {\"bbox_2d\": [114, 548, 803, 585], \"label\": \"Text\"}, {\"bbox_2d\": [114, 588, 960, 645], \"label\": \"Text\"}, {\"bbox_2d\": [114, 648, 960, 819], \"label\": \"Text\"}, {\"bbox_2d\": [114, 823, 960, 860], \"label\": \"Text\"}, {\"bbox_2d\": [114, 864, 960, 919], \"label\": \"Text\"}, {\"bbox_2d\": [157, 922, 960, 940], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [526, 41, 543, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 76, 340, 89], \"label\": \"Text\"}, {\"bbox_2d\": [114, 96, 957, 152], \"label\": \"Text\"}, {\"bbox_2d\": [115, 155, 957, 191], \"label\": \"Text\"}, {\"bbox_2d\": [115, 195, 956, 231], \"label\": \"Text\"}, {\"bbox_2d\": [114, 234, 956, 269], \"label\": \"Text\"}, {\"bbox_2d\": [115, 272, 957, 304], \"label\": \"Text\"}, {\"bbox_2d\": [115, 312, 957, 368], \"label\": \"Text\"}, {\"bbox_2d\": [114, 370, 957, 407], \"label\": \"Text\"}, {\"bbox_2d\": [115, 409, 957, 505], \"label\": \"Text\"}, {\"bbox_2d\": [114, 508, 957, 545], \"label\": \"Text\"}, {\"bbox_2d\": [114, 547, 956, 584], \"label\": \"Text\"}, {\"bbox_2d\": [114, 587, 956, 642], \"label\": \"Text\"}, {\"bbox_2d\": [113, 645, 956, 819], \"label\": \"Text\"}, {\"bbox_2d\": [113, 822, 955, 859], \"label\": \"Text\"}, {\"bbox_2d\": [113, 861, 955, 917], \"label\": \"Text\"}, {\"bbox_2d\": [159, 920, 955, 939], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 43, 562, 57], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 72, 472, 101], \"label\": \"Text\"}, {\"bbox_2d\": [85, 115, 962, 208], \"label\": \"Text\"}, {\"bbox_2d\": [85, 222, 947, 315], \"label\": \"Text\"}, {\"bbox_2d\": [85, 330, 962, 487], \"label\": \"Text\"}, {\"bbox_2d\": [85, 499, 963, 561], \"label\": \"Text\"}, {\"bbox_2d\": [85, 575, 958, 621], \"label\": \"Text\"}, {\"bbox_2d\": [85, 635, 923, 680], \"label\": \"Text\"}, {\"bbox_2d\": [85, 745, 244, 790], \"label\": \"Text\"}, {\"bbox_2d\": [801, 999, 965, 1008], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [85, 39, 562, 58], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 69, 472, 101], \"label\": \"Text\"}, {\"bbox_2d\": [85, 113, 966, 208], \"label\": \"Text\"}, {\"bbox_2d\": [85, 220, 965, 315], \"label\": \"Text\"}, {\"bbox_2d\": [85, 327, 966, 486], \"label\": \"Text\"}, {\"bbox_2d\": [85, 498, 967, 561], \"label\": \"Text\"}, {\"bbox_2d\": [85, 574, 962, 621], \"label\": \"Text\"}, {\"bbox_2d\": [85, 633, 927, 680], \"label\": \"Text\"}, {\"bbox_2d\": [85, 743, 243, 790], \"label\": \"Text\"}, {\"bbox_2d\": [802, 998, 965, 1009], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [783, 73, 887, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 521, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 308, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 143, 500, 306], \"label\": \"Text\"}, {\"bbox_2d\": [130, 308, 501, 517], \"label\": \"Text\"}, {\"bbox_2d\": [130, 520, 501, 850], \"label\": \"Text\"}, {\"bbox_2d\": [130, 853, 500, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 143, 895, 415], \"label\": \"Text\"}, {\"bbox_2d\": [526, 418, 894, 583], \"label\": \"Text\"}, {\"bbox_2d\": [526, 585, 894, 679], \"label\": \"Text\"}, {\"bbox_2d\": [526, 682, 895, 826], \"label\": \"Text\"}, {\"bbox_2d\": [526, 828, 894, 982], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [781, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 525, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 142, 501, 303], \"label\": \"Text\"}, {\"bbox_2d\": [130, 309, 500, 520], \"label\": \"Text\"}, {\"bbox_2d\": [130, 526, 503, 849], \"label\": \"Text\"}, {\"bbox_2d\": [130, 855, 500, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 142, 898, 413], \"label\": \"Text\"}, {\"bbox_2d\": [526, 418, 896, 519], \"label\": \"Text\"}, {\"bbox_2d\": [526, 524, 896, 678], \"label\": \"Text\"}, {\"bbox_2d\": [526, 683, 898, 797], \"label\": \"Text\"}, {\"bbox_2d\": [526, 802, 898, 982], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 72, 945, 339], \"label\": \"Text\"}, {\"bbox_2d\": [91, 818, 945, 959], \"label\": \"Text\"}, {\"bbox_2d\": [91, 778, 519, 793], \"label\": \"Section-header\"}, {\"bbox_2d\": [187, 706, 847, 721], \"label\": \"Caption\"}, {\"bbox_2d\": [257, 373, 766, 658], \"label\": \"Picture\"}, {\"bbox_2d\": [507, 994, 528, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 75, 944, 339], \"label\": \"Text\"}, {\"bbox_2d\": [91, 819, 944, 958], \"label\": \"Text\"}, {\"bbox_2d\": [91, 778, 519, 792], \"label\": \"Section-header\"}, {\"bbox_2d\": [188, 707, 847, 721], \"label\": \"Caption\"}, {\"bbox_2d\": [508, 993, 527, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [261, 374, 769, 660], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [80, 41, 701, 75], \"label\": \"Section-header\"}, {\"bbox_2d\": [107, 92, 405, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [110, 123, 924, 933], \"label\": \"Table\"}, {\"bbox_2d\": [62, 991, 330, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [105, 95, 405, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [65, 991, 441, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [101, 134, 945, 935], \"label\": \"Table\"}, {\"bbox_2d\": [77, 43, 695, 74], \"label\": \"Page-header\"}, {\"bbox_2d\": [105, 124, 472, 136], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [429, 29, 613, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [398, 85, 634, 117], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 151, 817, 171], \"label\": \"Section-header\"}, {\"bbox_2d\": [93, 209, 944, 710], \"label\": \"Picture\"}, {\"bbox_2d\": [101, 741, 929, 822], \"label\": \"Text\"}, {\"bbox_2d\": [101, 853, 929, 932], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [397, 75, 638, 125], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 142, 819, 175], \"label\": \"Section-header\"}, {\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [100, 737, 938, 825], \"label\": \"Text\"}, {\"bbox_2d\": [100, 847, 937, 936], \"label\": \"Text\"}, {\"bbox_2d\": [93, 209, 945, 713], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [368, 81, 669, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [114, 146, 506, 332], \"label\": \"Text\"}, {\"bbox_2d\": [528, 146, 921, 177], \"label\": \"Text\"}, {\"bbox_2d\": [528, 184, 921, 488], \"label\": \"Text\"}, {\"bbox_2d\": [114, 414, 506, 758], \"label\": \"Text\"}, {\"bbox_2d\": [528, 526, 921, 733], \"label\": \"Text\"}, {\"bbox_2d\": [528, 739, 921, 889], \"label\": \"Text\"}, {\"bbox_2d\": [114, 764, 506, 948], \"label\": \"Text\"}, {\"bbox_2d\": [528, 900, 921, 929], \"label\": \"Text\"}, {\"bbox_2d\": [528, 506, 730, 518], \"label\": \"Section-header\"}, {\"bbox_2d\": [114, 338, 506, 395], \"label\": \"Table\"}, {\"bbox_2d\": [511, 999, 525, 1007], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [367, 79, 668, 89], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 144, 511, 331], \"label\": \"Text\"}, {\"bbox_2d\": [172, 339, 506, 350], \"label\": \"Text\"}, {\"bbox_2d\": [113, 364, 508, 394], \"label\": \"Table\"}, {\"bbox_2d\": [115, 412, 511, 754], \"label\": \"Text\"}, {\"bbox_2d\": [115, 763, 511, 949], \"label\": \"Text\"}, {\"bbox_2d\": [527, 144, 923, 175], \"label\": \"Text\"}, {\"bbox_2d\": [527, 183, 923, 486], \"label\": \"Text\"}, {\"bbox_2d\": [527, 524, 923, 730], \"label\": \"Text\"}, {\"bbox_2d\": [527, 739, 924, 886], \"label\": \"Text\"}, {\"bbox_2d\": [527, 894, 920, 925], \"label\": \"Text\"}, {\"bbox_2d\": [512, 999, 523, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [527, 504, 729, 516], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 378, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 243, 314, 255], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 404, 501, 418], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 435, 387, 446], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 273, 914, 313], \"label\": \"Text\"}, {\"bbox_2d\": [121, 330, 914, 385], \"label\": \"Text\"}, {\"bbox_2d\": [121, 450, 785, 461], \"label\": \"Text\"}, {\"bbox_2d\": [121, 610, 914, 709], \"label\": \"Text\"}, {\"bbox_2d\": [121, 726, 478, 737], \"label\": \"Text\"}, {\"bbox_2d\": [121, 864, 914, 904], \"label\": \"Text\"}, {\"bbox_2d\": [150, 465, 914, 491], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 496, 914, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 539, 914, 565], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 566, 914, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [497, 942, 536, 953], \"label\": \"Page-footer\"}, {\"bbox_2d\": [137, 756, 216, 847], \"label\": \"Table\"}, {\"bbox_2d\": [383, 756, 678, 847], \"label\": \"Table\"}, {\"bbox_2d\": [150, 110, 502, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 126, 494, 137], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 142, 707, 153], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 157, 537, 168], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 172, 914, 198], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 202, 914, 227], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 379, 108], \"label\": \"Text\"}, {\"bbox_2d\": [152, 108, 505, 124], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 123, 499, 138], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 138, 710, 153], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 153, 540, 168], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 168, 915, 198], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 197, 912, 227], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 461, 913, 491], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 491, 913, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 535, 917, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 564, 917, 594], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 270, 912, 313], \"label\": \"Text\"}, {\"bbox_2d\": [121, 327, 915, 385], \"label\": \"Text\"}, {\"bbox_2d\": [121, 447, 786, 462], \"label\": \"Text\"}, {\"bbox_2d\": [121, 608, 914, 709], \"label\": \"Text\"}, {\"bbox_2d\": [121, 722, 482, 737], \"label\": \"Text\"}, {\"bbox_2d\": [138, 754, 222, 847], \"label\": \"Text\"}, {\"bbox_2d\": [383, 754, 681, 847], \"label\": \"Text\"}, {\"bbox_2d\": [121, 861, 914, 905], \"label\": \"Text\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 432, 387, 447], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 399, 505, 419], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 241, 315, 256], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [76, 102, 971, 827], \"label\": \"Picture\"}, {\"bbox_2d\": [76, 842, 232, 863], \"label\": \"Section-header\"}, {\"bbox_2d\": [76, 873, 232, 889], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 894, 232, 911], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 926, 232, 951], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label\": \"List-item\"}, {\"bbox_2d\": [76, 967, 223, 978], \"label", "label": "```json[{\"bbox_2d\": [67, 104, 970, 1012], \"label\": \"Picture\"}, {\"bbox_2d\": [77, 51, 289, 66], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [891, 18, 905, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [953, 14, 997, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 100, 450, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 288, 375, 299], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 398, 192, 409], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 414, 419, 425], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 429, 406, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 445, 302, 457], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 833, 344, 844], \"label\": \"Section-header\"}, {\"bbox_2d\": [186, 849, 409, 860], \"label\": \"Section-header\"}, {\"bbox_2d\": [862, 999, 904, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [862, 999, 991, 1017], \"label\": \"Page-footer\"}, {\"bbox_2d\": [186, 864, 963, 988], \"label\": \"Table\"}, {\"bbox_2d\": [186, 618, 962, 756], \"label\": \"Table\"}, {\"bbox_2d\": [186, 459, 963, 598], \"label\": \"Table\"}, {\"bbox_2d\": [186, 760, 837, 770], \"label\": \"Text\"}, {\"bbox_2d\": [219, 772, 796, 782], \"label\": \"Text\"}, {\"bbox_2d\": [219, 784, 643, 793], \"label\": \"Text\"}, {\"bbox_2d\": [219, 795, 529, 805], \"label\": \"Text\"}, {\"bbox_2d\": [219, 807, 820, 817], \"label\": \"Text\"}, {\"bbox_2d\": [138, 117, 957, 176], \"label\": \"Text\"}, {\"bbox_2d\": [138, 181, 942, 238], \"label\": \"Text\"}, {\"bbox_2d\": [138, 243, 954, 268], \"label\": \"Text\"}, {\"bbox_2d\": [138, 306, 951, 379], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [120, 101, 450, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 289, 374, 300], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 399, 192, 409], \"label\": \"Section-header\"}, {\"bbox_2d\": [780, 20, 905, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [953, 18, 997, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [137, 117, 962, 175], \"label\": \"Text\"}, {\"bbox_2d\": [137, 180, 948, 237], \"label\": \"Text\"}, {\"bbox_2d\": [137, 242, 959, 269], \"label\": \"Text\"}, {\"bbox_2d\": [137, 305, 954, 378], \"label\": \"Text\"}, {\"bbox_2d\": [139, 415, 420, 425], \"label\": \"Section-header\"}, {\"bbox_2d\": [151, 430, 405, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [161, 834, 343, 844], \"label\": \"Section-header\"}, {\"bbox_2d\": [185, 867, 963, 989], \"label\": \"Table\"}, {\"bbox_2d\": [184, 618, 964, 754], \"label\": \"Table\"}, {\"bbox_2d\": [185, 460, 963, 597], \"label\": \"Table\"}, {\"bbox_2d\": [161, 446, 302, 456], \"label\": \"Section-header\"}, {\"bbox_2d\": [185, 849, 410, 860], \"label\": \"Section-header\"}, {\"bbox_2d\": [185, 762, 839, 770], \"label\": \"Text\"}, {\"bbox_2d\": [219, 773, 796, 781], \"label\": \"Text\"}, {\"bbox_2d\": [219, 783, 646, 791], \"label\": \"Text\"}, {\"bbox_2d\": [219, 794, 937, 812], \"label\": \"Text\"}, {\"bbox_2d\": [863, 1001, 992, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 77, 1009, 97], \"label\": \"Text\"}, {\"bbox_2d\": [25, 107, 1009, 208], \"label\": \"Text\"}, {\"bbox_2d\": [25, 220, 1009, 253], \"label\": \"Text\"}, {\"bbox_2d\": [25, 259, 1009, 373], \"label\": \"Text\"}, {\"bbox_2d\": [25, 400, 1009, 492], \"label\": \"Text\"}, {\"bbox_2d\": [25, 501, 1009, 546], \"label\": \"Text\"}, {\"bbox_2d\": [25, 586, 1009, 640], \"label\": \"Text\"}, {\"bbox_2d\": [42, 386, 757, 398], \"label\": \"Section-header\"}, {\"bbox_2d\": [42, 560, 1009, 580], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 666, 525, 677], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 75, 1009, 97], \"label\": \"Text\"}, {\"bbox_2d\": [25, 106, 1009, 208], \"label\": \"Text\"}, {\"bbox_2d\": [25, 260, 1009, 372], \"label\": \"Text\"}, {\"bbox_2d\": [25, 401, 1009, 492], \"label\": \"Text\"}, {\"bbox_2d\": [25, 501, 1009, 545], \"label\": \"Text\"}, {\"bbox_2d\": [25, 586, 1009, 642], \"label\": \"Text\"}, {\"bbox_2d\": [508, 665, 525, 676], \"label\": \"Page-footer\"}, {\"bbox_2d\": [43, 559, 1009, 581], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 386, 756, 397], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 221, 1009, 255], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 83, 904, 114], \"label\": \"Text\"}, {\"bbox_2d\": [274, 131, 908, 238], \"label\": \"Text\"}, {\"bbox_2d\": [274, 255, 910, 489], \"label\": \"Text\"}, {\"bbox_2d\": [274, 506, 909, 598], \"label\": \"Text\"}, {\"bbox_2d\": [274, 616, 905, 645], \"label\": \"Text\"}, {\"bbox_2d\": [274, 722, 910, 830], \"label\": \"Text\"}, {\"bbox_2d\": [274, 847, 898, 877], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 379, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [274, 651, 463, 665], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 670, 432, 684], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 690, 566, 704], \"label\": \"List-item\"}, {\"bbox_2d\": [272, 442, 584, 457], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 903, 114], \"label\": \"Text\"}, {\"bbox_2d\": [273, 132, 908, 239], \"label\": \"Text\"}, {\"bbox_2d\": [273, 257, 911, 490], \"label\": \"Text\"}, {\"bbox_2d\": [273, 508, 909, 599], \"label\": \"Text\"}, {\"bbox_2d\": [273, 617, 905, 646], \"label\": \"Text\"}, {\"bbox_2d\": [273, 723, 911, 830], \"label\": \"Text\"}, {\"bbox_2d\": [273, 848, 897, 877], \"label\": \"Text\"}, {\"bbox_2d\": [273, 650, 463, 666], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 670, 432, 685], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 690, 565, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 141, 103, 154], \"label\": \"Page-header\"}, {\"bbox_2d\": [129, 136, 495, 154], \"label\": \"Page-header\"}, {\"bbox_2d\": [556, 234, 940, 291], \"label\": \"Text\"}, {\"bbox_2d\": [131, 234, 514, 403], \"label\": \"Text\"}, {\"bbox_2d\": [131, 435, 521, 604], \"label\": \"Text\"}, {\"bbox_2d\": [556, 322, 947, 493], \"label\": \"Text\"}, {\"bbox_2d\": [556, 547, 945, 695], \"label\": \"Text\"}, {\"bbox_2d\": [556, 727, 933, 760], \"label\": \"Text\"}, {\"bbox_2d\": [131, 637, 520, 650], \"label\": \"Text\"}, {\"bbox_2d\": [138, 679, 522, 923], \"label\": \"Table\"}, {\"bbox_2d\": [565, 792, 947, 877], \"label\": \"Table\"}, {\"bbox_2d\": [556, 525, 722, 538], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [91, 132, 497, 156], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 229, 518, 404], \"label\": \"Text\"}, {\"bbox_2d\": [130, 431, 522, 605], \"label\": \"Text\"}, {\"bbox_2d\": [130, 632, 521, 650], \"label\": \"Text\"}, {\"bbox_2d\": [555, 229, 941, 292], \"label\": \"Text\"}, {\"bbox_2d\": [555, 319, 947, 493], \"label\": \"Text\"}, {\"bbox_2d\": [555, 543, 946, 695], \"label\": \"Text\"}, {\"bbox_2d\": [555, 722, 934, 762], \"label\": \"Text\"}, {\"bbox_2d\": [555, 521, 725, 538], \"label\": \"Section-header\"}, {\"bbox_2d\": [560, 784, 950, 880], \"label\": \"Table\"}, {\"bbox_2d\": [135, 674, 526, 926], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [904, 47, 911, 66], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [121, 151, 911, 331], \"label\": \"Text\"}, {\"bbox_2d\": [121, 418, 911, 489], \"label\": \"Text\"}, {\"bbox_2d\": [121, 675, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [225, 498, 911, 656], \"label\": \"Formula\"}, {\"bbox_2d\": [301, 374, 734, 385], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [121, 150, 911, 341], \"label\": \"Text\"}, {\"bbox_2d\": [901, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 418, 911, 488], \"label\": \"Text\"}, {\"bbox_2d\": [121, 674, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [299, 373, 733, 387], \"label\": \"Section-header\"}, {\"bbox_2d\": [221, 504, 911, 667], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [230, 534, 926, 580], \"label\": \"Text\"}, {\"bbox_2d\": [230, 618, 897, 664], \"label\": \"Text\"}, {\"bbox_2d\": [230, 679, 922, 755], \"label\": \"Text\"}, {\"bbox_2d\": [230, 795, 917, 841], \"label\": \"Text\"}, {\"bbox_2d\": [230, 856, 926, 917], \"label\": \"Text\"}, {\"bbox_2d\": [231, 774, 367, 792], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 598, 723, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 499, 658, 512], \"label\": \"Caption\"}, {\"bbox_2d\": [230, 101, 928, 498], \"label\": \"Picture\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [228, 102, 928, 497], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 501, 657, 512], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 535, 926, 579], \"label\": \"Text\"}, {\"bbox_2d\": [231, 620, 902, 664], \"label\": \"Text\"}, {\"bbox_2d\": [231, 680, 924, 755], \"label\": \"Text\"}, {\"bbox_2d\": [231, 797, 916, 840], \"label\": \"Text\"}, {\"bbox_2d\": [231, 857, 925, 916], \"label\": \"Text\"}, {\"bbox_2d\": [231, 600, 724, 615], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 777, 369, 792], \"label\": \"Section-header\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 103, 419, 119], \"label\": \"Page-header\"}, {\"bbox_2d\": [850, 103, 995, 134], \"label\": \"Page-header\"}, {\"bbox_2d\": [139, 142, 293, 225], \"label\": \"Picture\"}, {\"bbox_2d\": [435, 189, 642, 205], \"label\": \"Text\"}, {\"bbox_2d\": [82, 261, 401, 276], \"label\": \"Text\"}, {\"bbox_2d\": [710, 261, 998, 276], \"label\": \"Text\"}, {\"bbox_2d\": [82, 302, 501, 316], \"label\": \"Text\"}, {\"bbox_2d\": [86, 340, 490, 352], \"label\": \"Text\"}, {\"bbox_2d\": [85, 360, 482, 372], \"label\": \"Text\"}, {\"bbox_2d\": [85, 380, 480, 392], \"label\": \"Text\"}, {\"bbox_2d\": [85, 400, 381, 412], \"label\": \"Text\"}, {\"bbox_2d\": [85, 420, 372, 432], \"label\": \"Text\"}, {\"bbox_2d\": [85, 461, 372, 492], \"label\": \"Text\"}, {\"bbox_2d\": [109, 518, 334, 549], \"label\": \"Text\"}, {\"bbox_2d\": [85, 499, 196, 513], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 621, 196, 634], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 660, 161, 672], \"label\": \"Section-header\"}, {\"bbox_2d\": [75, 679, 534, 912], \"label\": \"Text\"}, {\"bbox_2d\": [587, 302, 1004, 355], \"label\": \"Text\"}, {\"bbox_2d\": [556, 383, 924, 395], \"label\": \"Text\"}, {\"bbox_2d\": [651, 402, 861, 414], \"label\": \"Text\"}, {\"bbox_2d\": [599, 363, 731, 375], \"label\": \"Text\"}, {\"bbox_2d\": [601, 591, 900, 603], \"label\": \"Text\"}, {\"bbox_2d\": [129, 640, 511, 652], \"label\": \"Text\"}, {\"bbox_2d\": [617, 721, 927, 876], \"label\": \"Picture\"}, {\"bbox_2d\": [163, 441, 315, 491], \"label\": \"Section-header\"}, {\"bbox_2d\": [352, 149, 735, 169], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [136, 143, 295, 228], \"label\": \"Picture\"}, {\"bbox_2d\": [356, 146, 731, 169], \"label\": \"Title\"}, {\"bbox_2d\": [87, 102, 416, 119], \"label\": \"Title\"}, {\"bbox_2d\": [436, 191, 642, 203], \"label\": \"Title\"}, {\"bbox_2d\": [852, 102, 956, 115], \"label\": \"List-item\"}, {\"bbox_2d\": [804, 122, 996, 135], \"label\": \"Text\"}, {\"bbox_2d\": [85, 262, 399, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [711, 262, 996, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 303, 499, 314], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 322, 178, 333], \"label\": \"List-item\"}, {\"bbox_2d\": [119, 341, 490, 353], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 361, 481, 373], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 380, 475, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 401, 381, 413], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 420, 372, 453], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 460, 372, 492], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 500, 195, 511], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 517, 335, 551], \"label\": \"Table\"}, {\"bbox_2d\": [85, 622, 195, 633], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 660, 160, 671], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 640, 510, 653], \"label\": \"Text\"}, {\"bbox_2d\": [76, 678, 528, 911], \"label\": \"Text\"}, {\"bbox_2d\": [602, 592, 897, 603], \"label\": \"Text\"}, {\"bbox_2d\": [559, 382, 922, 415], \"label\": \"List-item\"}, {\"bbox_2d\": [590, 303, 1002, 353], \"label\": \"Text\"}, {\"bbox_2d\": [601, 362, 729, 374], \"label\": \"Text\"}, {\"bbox_2d\": [616, 723, 925, 877], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [122, 103, 350, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 119, 504, 161], \"label\": \"Text\"}, {\"bbox_2d\": [122, 164, 491, 194], \"label\": \"Text\"}, {\"bbox_2d\": [122, 242, 304, 254], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 425, 268, 438], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 639, 300, 652], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 732, 260, 744], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 442, 511, 596], \"label\": \"Text\"}, {\"bbox_2d\": [565, 442, 949, 470], \"label\": \"Text\"}, {\"bbox_2d\": [565, 473, 946, 516], \"label\": \"Text\"}, {\"bbox_2d\": [565, 519, 948, 596], \"label\": \"Text\"}, {\"bbox_2d\": [565, 654, 925, 701], \"label\": \"Text\"}, {\"bbox_2d\": [122, 656, 510, 685], \"label\": \"Text\"}, {\"bbox_2d\": [117, 751, 961, 811], \"label\": \"Table\"}, {\"bbox_2d\": [862, 1001, 975, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [118, 262, 963, 389], \"label\": \"Table\"}, {\"bbox_2d\": [565, 119, 939, 192], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [125, 104, 350, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 244, 305, 255], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 428, 268, 440], \"label\": \"Section-header\"}, {\"bbox_2d\": [125, 642, 300, 654], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 734, 260, 744], \"label\": \"Section-header\"}, {\"bbox_2d\": [863, 1001, 977, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [117, 753, 966, 811], \"label\": \"Table\"}, {\"bbox_2d\": [118, 263, 965, 388], \"label\": \"Table\"}, {\"bbox_2d\": [566, 120, 945, 193], \"label\": \"Text\"}, {\"bbox_2d\": [120, 120, 509, 162], \"label\": \"Text\"}, {\"bbox_2d\": [120, 167, 497, 193], \"label\": \"Text\"}, {\"bbox_2d\": [120, 444, 519, 596], \"label\": \"Text\"}, {\"bbox_2d\": [566, 444, 954, 471], \"label\": \"Text\"}, {\"bbox_2d\": [566, 476, 948, 518], \"label\": \"Text\"}, {\"bbox_2d\": [566, 523, 948, 596], \"label\": \"Text\"}, {\"bbox_2d\": [120, 658, 514, 685], \"label\": \"Text\"}, {\"bbox_2d\": [566, 658, 930, 700], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [781, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 92, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 315, 501, 540], \"label\": \"Text\"}, {\"bbox_2d\": [130, 548, 501, 639], \"label\": \"Text\"}, {\"bbox_2d\": [130, 647, 501, 835], \"label\": \"Text\"}, {\"bbox_2d\": [130, 843, 501, 981], \"label\": \"Text\"}, {\"bbox_2d\": [526, 554, 895, 602], \"label\": \"Text\"}, {\"bbox_2d\": [526, 607, 895, 812], \"label\": \"Text\"}, {\"bbox_2d\": [526, 819, 895, 981], \"label\": \"Text\"}, {\"bbox_2d\": [553, 485, 894, 532], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 143, 895, 454], \"label\": \"Text\"}, {\"bbox_2d\": [164, 162, 434, 289], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [779, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 92, 523, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [526, 142, 895, 387], \"label\": \"Text\"}, {\"bbox_2d\": [526, 390, 894, 452], \"label\": \"Text\"}, {\"bbox_2d\": [130, 315, 501, 521], \"label\": \"Text\"}, {\"bbox_2d\": [130, 526, 500, 640], \"label\": \"Text\"}, {\"bbox_2d\": [130, 644, 500, 837], \"label\": \"Text\"}, {\"bbox_2d\": [130, 841, 502, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 815, 896, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 601, 898, 812], \"label\": \"Text\"}, {\"bbox_2d\": [526, 548, 894, 602], \"label\": \"Text\"}, {\"bbox_2d\": [158, 156, 500, 292], \"label\": \"Picture\"}, {\"bbox_2d\": [554, 480, 896, 533], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [795, 42, 974, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [1004, 208, 1015, 335], \"label\": \"Page-header\"}, {\"bbox_2d\": [77, 193, 958, 898], \"label\": \"Table\"}, {\"bbox_2d\": [105, 360, 602, 379], \"label\": \"Text\"}, {\"bbox_2d\": [105, 358, 245, 359], \"label\": \"Section-header\"}, {\"bbox_2d\": [105, 888, 280, 891], \"label\": \"Section-header\"}, {\"bbox_2d\": [105, 712, 221, 724], \"label\": \"Section-header\"}, {\"bbox_2d\": [105, 648, 187, 660], \"label\": \"Section-header\"}, {\"bbox_2d\": [105, 408, 313, 419], \"label\": \"Section-header\"}, {\"bbox_2d\": [105, 325, 211, 332], \"label\": \"Section-header\"}, {\"bbox_2d\": [105, 242, 201, 253], \"label\": \"Section-header\"}, {\"bbox_2d\": [942, 994, 990, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [77, 909, 952, 944], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [795, 34, 976, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [1002, 206, 1018, 334], \"label\": \"Page-header\"}, {\"bbox_2d\": [77, 190, 956, 898], \"label\": \"Table\"}, {\"bbox_2d\": [78, 907, 955, 941], \"label\": \"Text\"}, {\"bbox_2d\": [943, 977, 1031, 1010], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [122, 107, 919, 280], \"label\": \"Text\"}, {\"bbox_2d\": [122, 295, 914, 375], \"label\": \"Text\"}, {\"bbox_2d\": [122, 390, 914, 500], \"label\": \"Text\"}, {\"bbox_2d\": [122, 515, 914, 657], \"label\": \"Text\"}, {\"bbox_2d\": [122, 672, 914, 814], \"label\": \"Text\"}, {\"bbox_2d\": [122, 829, 914, 877], \"label\": \"Text\"}, {\"bbox_2d\": [122, 913, 892, 952], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [121, 107, 934, 280], \"label\": \"Text\"}, {\"bbox_2d\": [121, 295, 914, 374], \"label\": \"Text\"}, {\"bbox_2d\": [121, 390, 914, 500], \"label\": \"Text\"}, {\"bbox_2d\": [121, 515, 914, 657], \"label\": \"Text\"}, {\"bbox_2d\": [121, 672, 914, 814], \"label\": \"Text\"}, {\"bbox_2d\": [121, 829, 913, 876], \"label\": \"Text\"}, {\"bbox_2d\": [121, 910, 897, 954], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 215, 110], \"label\": \"Text\"}, {\"bbox_2d\": [121, 167, 911, 325], \"label\": \"Text\"}, {\"bbox_2d\": [121, 332, 911, 430], \"label\": \"Text\"}, {\"bbox_2d\": [121, 505, 911, 548], \"label\": \"Text\"}, {\"bbox_2d\": [121, 618, 911, 745], \"label\": \"Text\"}, {\"bbox_2d\": [121, 755, 911, 935], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [218, 112, 911, 156], \"label\": \"Formula\"}, {\"bbox_2d\": [329, 466, 911, 487], \"label\": \"Formula\"}, {\"bbox_2d\": [288, 574, 911, 593], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 216, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 163, 911, 321], \"label\": \"Text\"}, {\"bbox_2d\": [121, 333, 911, 431], \"label\": \"Text\"}, {\"bbox_2d\": [121, 506, 910, 549], \"label\": \"Text\"}, {\"bbox_2d\": [121, 619, 911, 744], \"label\": \"Text\"}, {\"bbox_2d\": [121, 756, 911, 936], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [330, 470, 911, 485], \"label\": \"Formula\"}, {\"bbox_2d\": [290, 573, 911, 592], \"label\": \"Formula\"}, {\"bbox_2d\": [322, 112, 911, 162], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [75, 90, 568, 104], \"label\": \"Page-header\"}, {\"bbox_2d\": [689, 45, 963, 106], \"label\": \"Picture\"}, {\"bbox_2d\": [73, 136, 910, 315], \"label\": \"Title\"}, {\"bbox_2d\": [73, 336, 490, 730], \"label\": \"Text\"}, {\"bbox_2d\": [73, 769, 252, 807], \"label\": \"Picture\"}, {\"bbox_2d\": [78, 844, 355, 881], \"label\": \"Picture\"}, {\"bbox_2d\": [69, 957, 221, 1004], \"label\": \"Picture\"}, {\"bbox_2d\": [780, 972, 966, 1003], \"label\": \"Picture\"}, {\"bbox_2d\": [676, 970, 968, 1003], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [75, 144, 928, 315], \"label\": \"Title\"}, {\"bbox_2d\": [75, 336, 186, 351], \"label\": \"Text\"}, {\"bbox_2d\": [324, 336, 518, 351], \"label\": \"Text\"}, {\"bbox_2d\": [324, 368, 430, 383], \"label\": \"Text\"}, {\"bbox_2d\": [324, 399, 487, 414], \"label\": \"Text\"}, {\"bbox_2d\": [324, 431, 430, 446], \"label\": \"Text\"}, {\"bbox_2d\": [75, 368, 180, 383], \"label\": \"Text\"}, {\"bbox_2d\": [75, 399, 261, 414], \"label\": \"Text\"}, {\"bbox_2d\": [75, 431, 177, 446], \"label\": \"Text\"}, {\"bbox_2d\": [75, 462, 170, 477], \"label\": \"Text\"}, {\"bbox_2d\": [75, 493, 185, 508], \"label\": \"Text\"}, {\"bbox_2d\": [75, 525, 243, 540], \"label\": \"Text\"}, {\"bbox_2d\": [75, 556, 269, 571], \"label\": \"Text\"}, {\"bbox_2d\": [75, 587, 224, 603], \"label\": \"Text\"}, {\"bbox_2d\": [75, 619, 172, 634], \"label\": \"Text\"}, {\"bbox_2d\": [75, 650, 212, 665], \"label\": \"Text\"}, {\"bbox_2d\": [75, 682, 191, 697], \"label\": \"Text\"}, {\"bbox_2d\": [75, 713, 167, 728], \"label\": \"Text\"}, {\"bbox_2d\": [-1, 750, 396, 813], \"label\": \"Picture\"}, {\"bbox_2d\": [-1, 828, 396, 893], \"label\": \"Picture\"}, {\"bbox_2d\": [76, 959, 217, 1004], \"label\": \"Picture\"}, {\"bbox_2d\": [778, 969, 970, 1002], \"label\": \"Picture\"}, {\"bbox_2d\": [690, 46, 966, 105], \"label\": \"Picture\"}, {\"bbox_2d\": [76, 91, 568, 102], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [88, 68, 508, 293], \"label\": \"Picture\"}, {\"bbox_2d\": [537, 66, 955, 304], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 310, 506, 335], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 314, 951, 354], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 367, 506, 411], \"label\": \"Text\"}, {\"bbox_2d\": [91, 413, 506, 472], \"label\": \"Text\"}, {\"bbox_2d\": [91, 523, 506, 582], \"label\": \"Text\"}, {\"bbox_2d\": [91, 637, 506, 712], \"label\": \"Text\"}, {\"bbox_2d\": [91, 715, 506, 934], \"label\": \"Text\"}, {\"bbox_2d\": [536, 583, 951, 835], \"label\": \"Text\"}, {\"bbox_2d\": [536, 836, 951, 888], \"label\": \"Text\"}, {\"bbox_2d\": [536, 903, 951, 931], \"label\": \"Text\"}, {\"bbox_2d\": [536, 934, 951, 948], \"label\": \"Text\"}, {\"bbox_2d\": [611, 514, 873, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [608, 547, 879, 559], \"label\": \"Section-header\"}, {\"bbox_2d\": [252, 484, 506, 514], \"label\": \"Formula\"}, {\"bbox_2d\": [96, 591, 506, 626], \"label\": \"Formula\"}, {\"bbox_2d\": [96, 724, 506, 749], \"label\": \"Formula\"}, {\"bbox_2d\": [96, 892, 440, 935], \"label\": \"Formula\"}, {\"bbox_2d\": [657, 854, 951, 892], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [91, 310, 506, 336], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 316, 951, 355], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 366, 506, 409], \"label\": \"Text\"}, {\"bbox_2d\": [91, 412, 506, 469], \"label\": \"Text\"}, {\"bbox_2d\": [91, 522, 506, 589], \"label\": \"Text\"}, {\"bbox_2d\": [91, 638, 506, 720], \"label\": \"Text\"}, {\"bbox_2d\": [91, 713, 506, 906], \"label\": \"Text\"}, {\"bbox_2d\": [536, 902, 951, 930], \"label\": \"Text\"}, {\"bbox_2d\": [553, 934, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 582, 951, 805], \"label\": \"Text\"}, {\"bbox_2d\": [536, 808, 951, 836], \"label\": \"Text\"}, {\"bbox_2d\": [536, 389, 951, 477], \"label\": \"Text\"}, {\"bbox_2d\": [612, 516, 875, 535], \"label\": \"Section-header\"}, {\"bbox_2d\": [609, 549, 878, 560], \"label\": \"Section-header\"}, {\"bbox_2d\": [658, 852, 951, 885], \"label\": \"Formula\"}, {\"bbox_2d\": [263, 916, 506, 951], \"label\": \"Formula\"}, {\"bbox_2d\": [96, 593, 509, 633], \"label\": \"Formula\"}, {\"bbox_2d\": [195, 482, 506, 512], \"label\": \"Formula\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 68, 507, 295], \"label\": \"Picture\"}, {\"bbox_2d\": [537, 67, 951, 298], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [150, 91, 664, 104], \"label\": \"Text\"}, {\"bbox_2d\": [150, 117, 614, 130], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 143, 910, 232], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 245, 910, 282], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 295, 910, 332], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 345, 910, 410], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 423, 773, 436], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 449, 788, 461], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 474, 910, 537], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 550, 910, 614], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 627, 910, 665], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 678, 910, 716], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 729, 910, 794], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 807, 910, 869], \"label\": \"List-item\"}, {\"bbox_2d\": [150, 882, 910, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [213, 933, 910, 945], \"label\": \"Text\"}, {\"bbox_2d\": [521, 979, 540, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [150, 52, 280, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [734, 52, 890, 63], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [150, 92, 664, 105], \"label\": \"Text\"}, {\"bbox_2d\": [151, 882, 910, 920], \"label\": \"Text\"}, {\"bbox_2d\": [213, 932, 911, 946], \"label\": \"Text\"}, {\"bbox_2d\": [150, 805, 911, 869], \"label\": \"Text\"}, {\"bbox_2d\": [150, 729, 911, 793], \"label\": \"Text\"}, {\"bbox_2d\": [151, 703, 180, 716], \"label\": \"Text\"}, {\"bbox_2d\": [151, 627, 910, 691], \"label\": \"Text\"}, {\"bbox_2d\": [150, 551, 911, 615], \"label\": \"Text\"}, {\"bbox_2d\": [151, 347, 911, 411], \"label\": \"Text\"}, {\"bbox_2d\": [151, 296, 911, 334], \"label\": \"Text\"}, {\"bbox_2d\": [150, 245, 911, 283], \"label\": \"Text\"}, {\"bbox_2d\": [151, 118, 615, 131], \"label\": \"Text\"}, {\"bbox_2d\": [150, 143, 911, 233], \"label\": \"Text\"}, {\"bbox_2d\": [151, 423, 774, 436], \"label\": \"Text\"}, {\"bbox_2d\": [213, 449, 910, 487], \"label\": \"Text\"}, {\"bbox_2d\": [150, 500, 912, 538], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [728, 71, 936, 103], \"label\": \"Page-header\"}, {\"bbox_2d\": [163, 115, 864, 976], \"label\": \"Picture\"}, {\"bbox_2d\": [364, 988, 666, 1020], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [730, 72, 946, 103], \"label\": \"Page-header\"}, {\"bbox_2d\": [164, 110, 860, 979], \"label\": \"Picture\"}, {\"bbox_2d\": [361, 986, 657, 1002], \"label\": \"Caption\"}, {\"bbox_2d\": [494, 1009, 527, 1024], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 137], \"label\": \"Text\"}, {\"bbox_2d\": [121, 227, 911, 543], \"label\": \"Text\"}, {\"bbox_2d\": [121, 554, 911, 708], \"label\": \"Text\"}, {\"bbox_2d\": [121, 719, 911, 952], \"label\": \"Text\"}, {\"bbox_2d\": [121, 183, 490, 195], \"label\": \"Section-header\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [121, 226, 911, 543], \"label\": \"Text\"}, {\"bbox_2d\": [121, 553, 911, 707], \"label\": \"Text\"}, {\"bbox_2d\": [121, 719, 911, 953], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 182, 491, 195], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 122, 763, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 216, 422, 233], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 166, 918, 194], \"label\": \"Text\"}, {\"bbox_2d\": [292, 246, 913, 273], \"label\": \"Text\"}, {\"bbox_2d\": [292, 280, 744, 293], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 300, 577, 312], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 332, 775, 344], \"label\": \"Text\"}, {\"bbox_2d\": [292, 686, 857, 698], \"label\": \"Text\"}, {\"bbox_2d\": [292, 790, 924, 833], \"label\": \"Text\"}, {\"bbox_2d\": [292, 709, 575, 757], \"label\": \"Table\"}, {\"bbox_2d\": [292, 356, 786, 650], \"label\": \"Text\"}, {\"bbox_2d\": [913, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 764, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 217, 422, 234], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 166, 913, 195], \"label\": \"Text\"}, {\"bbox_2d\": [293, 245, 910, 274], \"label\": \"Text\"}, {\"bbox_2d\": [293, 332, 776, 345], \"label\": \"Text\"}, {\"bbox_2d\": [293, 281, 738, 294], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 300, 576, 313], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 687, 857, 700], \"label\": \"Text\"}, {\"bbox_2d\": [293, 790, 918, 834], \"label\": \"Text\"}, {\"bbox_2d\": [317, 356, 783, 650], \"label\": \"Text\"}, {\"bbox_2d\": [913, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [292, 713, 571, 758], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [52, 54, 629, 76], \"label\": \"Section-header\"}, {\"bbox_2d\": [52, 83, 204, 95], \"label\": \"Text\"}, {\"bbox_2d\": [51, 110, 978, 741], \"label\": \"Table\"}, {\"bbox_2d\": [52, 756, 734, 767], \"label\": \"Text\"}, {\"bbox_2d\": [814, 1002, 1005, 1014], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [53, 56, 629, 78], \"label\": \"Section-header\"}, {\"bbox_2d\": [53, 84, 203, 94], \"label\": \"Text\"}, {\"bbox_2d\": [53, 756, 733, 767], \"label\": \"Text\"}, {\"bbox_2d\": [867, 1003, 1004, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [815, 1002, 861, 1012], \"label\": \"Picture\"}, {\"bbox_2d\": [49, 106, 990, 741], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 67, 951, 78], \"label\": \"Page-header\"}, {\"bbox_2d\": [569, 71, 951, 96], \"label\": \"Text\"}, {\"bbox_2d\": [91, 71, 506, 96], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 99, 506, 124], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 127, 506, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 156, 506, 194], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 197, 506, 235], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 238, 506, 263], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 266, 506, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 294, 506, 319], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 100, 951, 124], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 127, 951, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 156, 951, 194], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 197, 951, 222], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 226, 951, 263], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 266, 951, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 284, 951, 309], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 312, 951, 337], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 72, 506, 97], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 99, 506, 125], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 127, 506, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 154, 506, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 195, 506, 234], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 236, 506, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 263, 506, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 291, 506, 316], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 99, 951, 125], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 127, 951, 152], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 154, 951, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 195, 951, 220], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 222, 951, 248], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 250, 951, 275], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 277, 951, 303], \"label\": \"List-item\"}, {\"bbox_2d\": [567, 72, 951, 97], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [238, 134, 906, 154], \"label\": \"Page-header\"}, {\"bbox_2d\": [1017, 140, 1029, 154], \"label\": \"Page-header\"}, {\"bbox_2d\": [513, 234, 903, 403], \"label\": \"Text\"}, {\"bbox_2d\": [513, 435, 904, 560], \"label\": \"Text\"}, {\"bbox_2d\": [88, 257, 461, 404], \"label\": \"Text\"}, {\"bbox_2d\": [513, 592, 903, 625], \"label\": \"Text\"}, {\"bbox_2d\": [88, 460, 474, 605], \"label\": \"Text\"}, {\"bbox_2d\": [513, 638, 903, 784], \"label\": \"Text\"}, {\"bbox_2d\": [88, 749, 477, 874], \"label\": \"Text\"}, {\"bbox_2d\": [513, 816, 901, 917], \"label\": \"Text\"}, {\"bbox_2d\": [88, 638, 478, 717], \"label\": \"Text\"}, {\"bbox_2d\": [88, 435, 208, 448], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 234, 277, 248], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 749, 208, 762], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [237, 126, 907, 158], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 231, 278, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 433, 209, 450], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 253, 470, 405], \"label\": \"Text\"}, {\"bbox_2d\": [87, 454, 474, 606], \"label\": \"Text\"}, {\"bbox_2d\": [87, 633, 478, 718], \"label\": \"Text\"}, {\"bbox_2d\": [87, 745, 476, 875], \"label\": \"Text\"}, {\"bbox_2d\": [512, 229, 903, 404], \"label\": \"Text\"}, {\"bbox_2d\": [512, 431, 904, 561], \"label\": \"Text\"}, {\"bbox_2d\": [512, 587, 903, 784], \"label\": \"Text\"}, {\"bbox_2d\": [512, 811, 899, 919], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [573, 69, 840, 82], \"label\": \"Section-header\"}, {\"bbox_2d\": [697, 140, 938, 152], \"label\": \"Section-header\"}, {\"bbox_2d\": [151, 140, 726, 195], \"label\": \"Text\"}, {\"bbox_2d\": [151, 225, 837, 257], \"label\": \"Text\"}, {\"bbox_2d\": [151, 288, 838, 342], \"label\": \"Text\"}, {\"bbox_2d\": [151, 373, 838, 447], \"label\": \"Text\"}, {\"bbox_2d\": [150, 707, 544, 889], \"label\": \"Picture\"}, {\"bbox_2d\": [903, 947, 922, 957], \"label\": \"Page-footer\"}, {\"bbox_2d\": [895, 965, 998, 1019], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [151, 225, 838, 258], \"label\": \"Text\"}, {\"bbox_2d\": [151, 288, 838, 343], \"label\": \"Text\"}, {\"bbox_2d\": [151, 373, 838, 448], \"label\": \"Text\"}, {\"bbox_2d\": [147, 706, 544, 889], \"label\": \"Picture\"}, {\"bbox_2d\": [151, 140, 666, 152], \"label\": \"Text\"}, {\"bbox_2d\": [151, 161, 838, 195], \"label\": \"Text\"}, {\"bbox_2d\": [906, 948, 927, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [701, 142, 941, 152], \"label\": \"Text\"}, {\"bbox_2d\": [575, 69, 844, 83], \"label\": \"Page-header\"}, {\"bbox_2d\": [894, 959, 1000, 1022], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [169, 563, 749, 582], \"label\": \"Text\"}, {\"bbox_2d\": [253, 616, 575, 635], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [169, 564, 778, 582], \"label\": \"Text\"}, {\"bbox_2d\": [253, 617, 576, 634], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [460, 47, 968, 112], \"label\": \"Text\"}, {\"bbox_2d\": [460, 149, 967, 320], \"label\": \"Text\"}, {\"bbox_2d\": [460, 328, 974, 499], \"label\": \"Text\"}, {\"bbox_2d\": [460, 507, 972, 609], \"label\": \"Text\"}, {\"bbox_2d\": [460, 643, 532, 656], \"label\": \"Text\"}, {\"bbox_2d\": [170, 527, 393, 675], \"label\": \"Text\"}, {\"bbox_2d\": [170, 437, 347, 481], \"label\": \"Text\"}, {\"bbox_2d\": [170, 270, 385, 391], \"label\": \"Text\"}, {\"bbox_2d\": [170, 154, 399, 223], \"label\": \"Text\"}, {\"bbox_2d\": [460, 127, 728, 138], \"label\": \"Section-header\"}, {\"bbox_2d\": [170, 48, 361, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [460, 724, 538, 735], \"label\": \"Caption\"}, {\"bbox_2d\": [458, 667, 622, 724], \"label\": \"Picture\"}, {\"bbox_2d\": [749, 987, 975, 998], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [169, 40, 363, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [460, 721, 537, 736], \"label\": \"Caption\"}, {\"bbox_2d\": [170, 150, 400, 225], \"label\": \"Text\"}, {\"bbox_2d\": [170, 265, 385, 392], \"label\": \"Text\"}, {\"bbox_2d\": [170, 433, 347, 481], \"label\": \"Text\"}, {\"bbox_2d\": [170, 522, 399, 675], \"label\": \"Text\"}, {\"bbox_2d\": [460, 44, 973, 113], \"label\": \"Text\"}, {\"bbox_2d\": [460, 145, 970, 319], \"label\": \"Text\"}, {\"bbox_2d\": [460, 325, 976, 500], \"label\": \"Text\"}, {\"bbox_2d\": [460, 506, 976, 610], \"label\": \"Text\"}, {\"bbox_2d\": [460, 124, 729, 140], \"label\": \"Section-header\"}, {\"bbox_2d\": [460, 640, 530, 656], \"label\": \"Text\"}, {\"bbox_2d\": [447, 656, 623, 720], \"label\": \"Picture\"}, {\"bbox_2d\": [749, 987, 976, 997], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [273, 83, 685, 174], \"label\": \"Text\"}, {\"bbox_2d\": [281, 188, 456, 201], \"label\": \"Text\"}, {\"bbox_2d\": [273, 256, 789, 743], \"label\": \"Picture\"}, {\"bbox_2d\": [274, 753, 475, 767], \"label\": \"Caption\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [272, 755, 476, 766], \"label\": \"Caption\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 83, 577, 96], \"label\": \"Text\"}, {\"bbox_2d\": [281, 109, 516, 135], \"label\": \"Text\"}, {\"bbox_2d\": [273, 148, 684, 174], \"label\": \"Text\"}, {\"bbox_2d\": [281, 187, 456, 201], \"label\": \"Text\"}, {\"bbox_2d\": [273, 255, 788, 742], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [574, 67, 843, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [150, 138, 843, 196], \"label\": \"Text\"}, {\"bbox_2d\": [150, 222, 843, 302], \"label\": \"Text\"}, {\"bbox_2d\": [150, 328, 843, 409], \"label\": \"Text\"}, {\"bbox_2d\": [150, 434, 843, 555], \"label\": \"Text\"}, {\"bbox_2d\": [150, 582, 843, 747], \"label\": \"Text\"}, {\"bbox_2d\": [150, 773, 843, 937], \"label\": \"Text\"}, {\"bbox_2d\": [915, 945, 927, 957], \"label\": \"Page-footer\"}, {\"bbox_2d\": [892, 961, 1002, 1021], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [574, 69, 844, 83], \"label\": \"Page-header\"}, {\"bbox_2d\": [151, 140, 838, 195], \"label\": \"Text\"}, {\"bbox_2d\": [151, 225, 838, 300], \"label\": \"Text\"}, {\"bbox_2d\": [151, 331, 838, 406], \"label\": \"Text\"}, {\"bbox_2d\": [151, 437, 838, 554], \"label\": \"Text\"}, {\"bbox_2d\": [151, 585, 838, 745], \"label\": \"Text\"}, {\"bbox_2d\": [151, 775, 838, 935], \"label\": \"Text\"}, {\"bbox_2d\": [882, 958, 1001, 1022], \"label\": \"Picture\"}, {\"bbox_2d\": [917, 948, 924, 958], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [273, 84, 724, 98], \"label\": \"Text\"}, {\"bbox_2d\": [339, 103, 887, 134], \"label\": \"Text\"}, {\"bbox_2d\": [339, 462, 775, 476], \"label\": \"Text\"}, {\"bbox_2d\": [273, 482, 909, 512], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 540, 914, 634], \"label\": \"Text\"}, {\"bbox_2d\": [273, 651, 890, 679], \"label\": \"Text\"}, {\"bbox_2d\": [273, 524, 319, 539], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 979, 379, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [338, 145, 911, 436], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 723, 98], \"label\": \"List-item\"}, {\"bbox_2d\": [339, 104, 884, 133], \"label\": \"Text\"}, {\"bbox_2d\": [339, 463, 774, 477], \"label\": \"Text\"}, {\"bbox_2d\": [273, 483, 906, 512], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 542, 912, 633], \"label\": \"Text\"}, {\"bbox_2d\": [273, 651, 885, 680], \"label\": \"Text\"}, {\"bbox_2d\": [273, 526, 313, 540], \"label\": \"Section-header\"}, {\"bbox_2d\": [363, 144, 890, 415], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 97, 600, 110], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 131, 421, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 572, 536, 584], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 603, 904, 632], \"label\": \"List-item\"}, {\"bbox_2d\": [166, 636, 610, 742], \"label\": \"Text\"}, {\"bbox_2d\": [166, 763, 913, 792], \"label\": \"Text\"}, {\"bbox_2d\": [166, 810, 913, 840], \"label\": \"Text\"}, {\"bbox_2d\": [497, 940, 536, 953], \"label\": \"Page-footer\"}, {\"bbox_2d\": [166, 316, 913, 360], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 380, 913, 457], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 475, 913, 552], \"label\": \"List-item\"}, {\"bbox_2d\": [151, 155, 917, 295], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 605, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 129, 425, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 568, 540, 585], \"label\": \"Section-header\"}, {\"bbox_2d\": [167, 760, 919, 792], \"label\": \"Text\"}, {\"bbox_2d\": [167, 808, 940, 840], \"label\": \"Text\"}, {\"bbox_2d\": [152, 313, 919, 361], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 377, 919, 457], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 472, 920, 553], \"label\": \"List-item\"}, {\"bbox_2d\": [153, 150, 913, 298], \"label\": \"Table\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}, {\"bbox_2d\": [165, 632, 616, 745], \"label\": \"Table\"}, {\"bbox_2d\": [137, 600, 908, 633], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 88, 691, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 118, 932, 146], \"label\": \"Text\"}, {\"bbox_2d\": [292, 165, 898, 192], \"label\": \"Text\"}, {\"bbox_2d\": [344, 212, 924, 239], \"label\": \"Text\"}, {\"bbox_2d\": [292, 312, 529, 325], \"label\": \"Text\"}, {\"bbox_2d\": [292, 707, 928, 735], \"label\": \"Text\"}, {\"bbox_2d\": [592, 979, 932, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [294, 655, 517, 668], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 676, 408, 689], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 246, 892, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 283, 408, 295], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 118, 932, 146], \"label\": \"Text\"}, {\"bbox_2d\": [294, 283, 408, 295], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 676, 408, 689], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 246, 892, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 283, 408, 295], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 312, 529, 325], \"label\": \"Text\"}, {\"bbox_2d\": [294, 331, 929, 641], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [121, 89, 691, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 118, 932, 146], \"label\": \"Text\"}, {\"bbox_2d\": [344, 211, 923, 240], \"label\": \"Text\"}, {\"bbox_2d\": [293, 313, 529, 327], \"label\": \"Text\"}, {\"bbox_2d\": [293, 245, 891, 276], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 282, 408, 295], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 165, 898, 193], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 657, 518, 670], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 676, 408, 689], \"label\": \"List-item\"}, {\"bbox_2d\": [493, 338, 675, 349], \"label\": \"Section-header\"}, {\"bbox_2d\": [314, 360, 806, 526], \"label\": \"Table\"}, {\"bbox_2d\": [317, 608, 540, 620], \"label\": \"Text\"}, {\"bbox_2d\": [293, 705, 927, 736], \"label\": \"Text\"}, {\"bbox_2d\": [593, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [273, 83, 407, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 103, 527, 117], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 123, 448, 137], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 155, 595, 168], \"label\": \"Text\"}, {\"bbox_2d\": [273, 540, 552, 554], \"label\": \"Text\"}, {\"bbox_2d\": [197, 510, 553, 524], \"label\": \"Text\"}, {\"bbox_2d\": [197, 511, 686, 589], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 603, 777, 888], \"label\": \"Text\"}, {\"bbox_2d\": [197, 979, 371, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 174, 910, 482], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 406, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 573, 687, 590], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 603, 608, 614], \"label\": \"Section-header\"}, {\"bbox_2d\": [493, 180, 621, 192], \"label\": \"Section-header\"}, {\"bbox_2d\": [297, 873, 520, 885], \"label\": \"Text\"}, {\"bbox_2d\": [297, 450, 533, 462], \"label\": \"Text\"}, {\"bbox_2d\": [297, 203, 472, 215], \"label\": \"Text\"}, {\"bbox_2d\": [273, 155, 594, 169], \"label\": \"Text\"}, {\"bbox_2d\": [273, 102, 527, 118], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 124, 449, 137], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 510, 553, 524], \"label\": \"Text\"}, {\"bbox_2d\": [273, 542, 551, 555], \"label\": \"Text\"}, {\"bbox_2d\": [297, 697, 472, 708], \"label\": \"Text\"}, {\"bbox_2d\": [296, 625, 656, 675], \"label\": \"Table\"}, {\"bbox_2d\": [309, 720, 777, 861], \"label\": \"Table\"}, {\"bbox_2d\": [310, 225, 791, 309], \"label\": \"Table\"}, {\"bbox_2d\": [101, 978, 370, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [497, 101, 911, 284], \"label\": \"Text\"}, {\"bbox_2d\": [497, 289, 911, 716], \"label\": \"Text\"}, {\"bbox_2d\": [497, 721, 911, 974], \"label\": \"Text\"}, {\"bbox_2d\": [464, 988, 492, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [40, 879, 455, 974], \"label\": \"Text\"}, {\"bbox_2d\": [39, 103, 455, 213], \"label\": \"Text\"}, {\"bbox_2d\": [40, 217, 455, 451], \"label\": \"Text\"}, {\"bbox_2d\": [39, 456, 455, 873], \"label\": \"Text\"}, {\"bbox_2d\": [45, 60, 280, 87], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [44, 56, 294, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [39, 98, 465, 210], \"label\": \"Text\"}, {\"bbox_2d\": [496, 97, 923, 283], \"label\": \"Text\"}, {\"bbox_2d\": [38, 211, 472, 450], \"label\": \"Text\"}, {\"bbox_2d\": [496, 284, 926, 714], \"label\": \"Text\"}, {\"bbox_2d\": [39, 451, 466, 868], \"label\": \"Text\"}, {\"bbox_2d\": [496, 714, 918, 970], \"label\": \"Text\"}, {\"bbox_2d\": [40, 871, 464, 970], \"label\": \"Text\"}, {\"bbox_2d\": [463, 978, 502, 996], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 34, 410, 88], \"label\": \"Picture\"}, {\"bbox_2d\": [216, 139, 902, 180], \"label\": \"List-item\"}, {\"bbox_2d\": [325, 192, 902, 234], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 244, 902, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 282, 902, 323], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 334, 902, 376], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 387, 902, 428], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 439, 768, 451], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 463, 902, 505], \"label\": \"List-item\"}, {\"bbox_2d\": [289, 515, 902, 543], \"label\": \"List-item\"}, {\"bbox_2d\": [325, 554, 902, 581], \"label\": \"List-item\"}, {\"bbox_2d\": [325, 592, 902, 662], \"label\": \"List-item\"}, {\"bbox_2d\": [325, 673, 902, 730], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 739, 902, 766], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 778, 902, 805], \"label\": \"List-item\"}, {\"bbox_2d\": [366, 815, 902, 885], \"label\": \"List-item\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [123, 136, 912, 895], \"label\": \"Table\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [123, 34, 409, 86], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 122, 396, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [118, 158, 934, 952], \"label\": \"Table\"}, {\"bbox_2d\": [912, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 396, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 160, 949, 953], \"label\": \"Table\"}, {\"bbox_2d\": [913, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 45, 979, 826], \"label\": \"Table\"}, {\"bbox_2d\": [121, 827, 507, 839], \"label\": \"Text\"}, {\"bbox_2d\": [750, 987, 975, 998], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [130, 46, 570, 85], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 103, 329, 117], \"label\": \"Section-header\"}, {\"bbox_2d\": [124, 119, 977, 825], \"label\": \"Table\"}, {\"bbox_2d\": [121, 825, 506, 839], \"label\": \"Text\"}, {\"bbox_2d\": [749, 987, 976, 997], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [526, 37, 546, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 77, 957, 151], \"label\": \"Text\"}, {\"bbox_2d\": [115, 157, 957, 271], \"label\": \"Text\"}, {\"bbox_2d\": [115, 276, 957, 329], \"label\": \"Text\"}, {\"bbox_2d\": [115, 334, 957, 412], \"label\": \"Text\"}, {\"bbox_2d\": [115, 413, 957, 466], \"label\": \"Text\"}, {\"bbox_2d\": [115, 472, 957, 525], \"label\": \"Text\"}, {\"bbox_2d\": [115, 529, 957, 565], \"label\": \"Text\"}, {\"bbox_2d\": [115, 569, 957, 605], \"label\": \"Text\"}, {\"bbox_2d\": [115, 609, 957, 645], \"label\": \"Text\"}, {\"bbox_2d\": [115, 649, 957, 744], \"label\": \"Text\"}, {\"bbox_2d\": [115, 748, 957, 782], \"label\": \"Text\"}, {\"bbox_2d\": [115, 786, 957, 819], \"label\": \"Text\"}, {\"bbox_2d\": [115, 823, 957, 882], \"label\": \"Text\"}, {\"bbox_2d\": [115, 886, 957, 922], \"label\": \"Text\"}, {\"bbox_2d\": [115, 925, 957, 940], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [528, 40, 545, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 76, 959, 152], \"label\": \"Text\"}, {\"bbox_2d\": [115, 155, 960, 270], \"label\": \"Text\"}, {\"bbox_2d\": [116, 274, 960, 329], \"label\": \"Text\"}, {\"bbox_2d\": [116, 332, 961, 407], \"label\": \"Text\"}, {\"bbox_2d\": [116, 411, 960, 466], \"label\": \"Text\"}, {\"bbox_2d\": [116, 469, 960, 523], \"label\": \"Text\"}, {\"bbox_2d\": [116, 528, 960, 565], \"label\": \"Text\"}, {\"bbox_2d\": [116, 568, 960, 604], \"label\": \"Text\"}, {\"bbox_2d\": [162, 607, 697, 624], \"label\": \"Text\"}, {\"bbox_2d\": [162, 627, 709, 644], \"label\": \"Text\"}, {\"bbox_2d\": [116, 647, 960, 742], \"label\": \"Text\"}, {\"bbox_2d\": [116, 745, 961, 780], \"label\": \"Text\"}, {\"bbox_2d\": [116, 784, 961, 819], \"label\": \"Text\"}, {\"bbox_2d\": [116, 823, 961, 880], \"label\": \"Text\"}, {\"bbox_2d\": [116, 883, 961, 919], \"label\": \"Text\"}, {\"bbox_2d\": [162, 925, 961, 941], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [35, 58, 217, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [32, 101, 446, 229], \"label\": \"Text\"}, {\"bbox_2d\": [32, 237, 444, 417], \"label\": \"Text\"}, {\"bbox_2d\": [32, 425, 443, 700], \"label\": \"Text\"}, {\"bbox_2d\": [27, 707, 442, 871], \"label\": \"Text\"}, {\"bbox_2d\": [36, 877, 439, 967], \"label\": \"Text\"}, {\"bbox_2d\": [479, 842, 868, 856], \"label\": \"Text\"}, {\"bbox_2d\": [479, 906, 855, 929], \"label\": \"Text\"}, {\"bbox_2d\": [479, 629, 891, 754], \"label\": \"Text\"}, {\"bbox_2d\": [479, 535, 891, 623], \"label\": \"Text\"}, {\"bbox_2d\": [479, 217, 892, 528], \"label\": \"Text\"}, {\"bbox_2d\": [504, 103, 892, 211], \"label\": \"Text\"}, {\"bbox_2d\": [479, 784, 790, 798], \"label\": \"Text\"}, {\"bbox_2d\": [553, 807, 737, 835], \"label\": \"Text\"}, {\"bbox_2d\": [535, 761, 652, 771], \"label\": \"Text\"}, {\"bbox_2d\": [524, 861, 635, 927], \"label\": \"Formula\"}, {\"bbox_2d\": [567, 936, 840, 967], \"label\": \"Formula\"}, {\"bbox_2d\": [453, 981, 461, 992], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [34, 57, 231, 88], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 99, 902, 210], \"label\": \"Text\"}, {\"bbox_2d\": [32, 99, 455, 228], \"label\": \"Text\"}, {\"bbox_2d\": [499, 212, 906, 549], \"label\": \"Text\"}, {\"bbox_2d\": [30, 232, 452, 417], \"label\": \"Text\"}, {\"bbox_2d\": [28, 421, 450, 698], \"label\": \"Text\"}, {\"bbox_2d\": [480, 550, 897, 623], \"label\": \"Text\"}, {\"bbox_2d\": [476, 624, 904, 754], \"label\": \"Text\"}, {\"bbox_2d\": [531, 757, 657, 775], \"label\": \"Text\"}, {\"bbox_2d\": [478, 779, 798, 796], \"label\": \"Text\"}, {\"bbox_2d\": [558, 800, 704, 818], \"label\": \"Text\"}, {\"bbox_2d\": [564, 820, 745, 837], \"label\": \"Text\"}, {\"bbox_2d\": [478, 838, 872, 855], \"label\": \"Text\"}, {\"bbox_2d\": [527, 850, 631, 883], \"label\": \"Formula\"}, {\"bbox_2d\": [477, 903, 864, 921], \"label\": \"Text\"}, {\"bbox_2d\": [568, 933, 843, 950], \"label\": \"Text\"}, {\"bbox_2d\": [563, 951, 775, 969], \"label\": \"Text\"}, {\"bbox_2d\": [27, 702, 452, 868], \"label\": \"Text\"}, {\"bbox_2d\": [46, 874, 445, 966], \"label\": \"Text\"}, {\"bbox_2d\": [452, 976, 468, 992], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [785, 62, 987, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [572, 108, 994, 194], \"label\": \"Text\"}, {\"bbox_2d\": [122, 108, 545, 326], \"label\": \"Text\"}, {\"bbox_2d\": [122, 331, 545, 721], \"label\": \"Text\"}, {\"bbox_2d\": [577, 200, 993, 345], \"label\": \"Text\"}, {\"bbox_2d\": [577, 350, 994, 625], \"label\": \"Text\"}, {\"bbox_2d\": [577, 630, 994, 857], \"label\": \"Text\"}, {\"bbox_2d\": [577, 863, 994, 976], \"label\": \"Text\"}, {\"bbox_2d\": [122, 726, 545, 976], \"label\": \"Text\"}, {\"bbox_2d\": [544, 987, 570, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [785, 61, 1001, 92], \"label\": \"Page-header\"}, {\"bbox_2d\": [577, 103, 998, 195], \"label\": \"Text\"}, {\"bbox_2d\": [577, 196, 1003, 345], \"label\": \"Text\"}, {\"bbox_2d\": [124, 104, 548, 326], \"label\": \"Text\"}, {\"bbox_2d\": [124, 327, 549, 720], \"label\": \"Text\"}, {\"bbox_2d\": [578, 346, 1006, 627], \"label\": \"Text\"}, {\"bbox_2d\": [577, 628, 1003, 857], \"label\": \"Text\"}, {\"bbox_2d\": [123, 721, 551, 975], \"label\": \"Text\"}, {\"bbox_2d\": [577, 859, 1002, 976], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 89, 911, 178], \"label\": \"Text\"}, {\"bbox_2d\": [125, 189, 911, 275], \"label\": \"Text\"}, {\"bbox_2d\": [125, 368, 155, 383], \"label\": \"Text\"}, {\"bbox_2d\": [125, 501, 911, 614], \"label\": \"Text\"}, {\"bbox_2d\": [125, 800, 911, 844], \"label\": \"Caption\"}, {\"bbox_2d\": [125, 867, 911, 932], \"label\": \"Text\"}, {\"bbox_2d\": [508, 974, 525, 989], \"label\": \"Page-footer\"}, {\"bbox_2d\": [255, 303, 911, 347], \"label\": \"Formula\"}, {\"bbox_2d\": [251, 387, 911, 437], \"label\": \"Formula\"}, {\"bbox_2d\": [251, 444, 911, 493], \"label\": \"Formula\"}, {\"bbox_2d\": [319, 637, 706, 779], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [125, 91, 911, 177], \"label\": \"Text\"}, {\"bbox_2d\": [125, 189, 911, 275], \"label\": \"Text\"}, {\"bbox_2d\": [125, 370, 152, 383], \"label\": \"Text\"}, {\"bbox_2d\": [125, 503, 911, 614], \"label\": \"Text\"}, {\"bbox_2d\": [125, 870, 911, 931], \"label\": \"Text\"}, {\"bbox_2d\": [125, 802, 911, 844], \"label\": \"Caption\"}, {\"bbox_2d\": [361, 387, 911, 505], \"label\": \"Formula\"}, {\"bbox_2d\": [361, 303, 911, 354], \"label\": \"Formula\"}, {\"bbox_2d\": [509, 974, 527, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [322, 638, 705, 783], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 93, 316, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 316, 335, 329], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 476, 416, 488], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 111, 505, 135], \"label\": \"Text\"}, {\"bbox_2d\": [91, 141, 508, 294], \"label\": \"Table\"}, {\"bbox_2d\": [91, 333, 508, 406], \"label\": \"Table\"}, {\"bbox_2d\": [526, 111, 948, 248], \"label\": \"Table\"}, {\"bbox_2d\": [526, 264, 943, 301], \"label\": \"Text\"}, {\"bbox_2d\": [91, 437, 508, 461], \"label\": \"Text\"}, {\"bbox_2d\": [91, 491, 795, 503], \"label\": \"Text\"}, {\"bbox_2d\": [91, 659, 825, 671], \"label\": \"Text\"}, {\"bbox_2d\": [91, 677, 949, 959], \"label\": \"Table\"}, {\"bbox_2d\": [584, 998, 943, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [526, 262, 944, 301], \"label\": \"Text\"}, {\"bbox_2d\": [91, 496, 948, 648], \"label\": \"Table\"}, {\"bbox_2d\": [91, 507, 730, 627], \"label\": \"Table\"}, {\"bbox_2d\": [91, 111, 505, 135], \"label\": \"Text\"}, {\"bbox_2d\": [1006, 94, 1017, 339], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [1002, 94, 1013, 343], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 316, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 476, 417, 486], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 436, 507, 461], \"label\": \"Text\"}, {\"bbox_2d\": [528, 263, 944, 302], \"label\": \"Text\"}, {\"bbox_2d\": [91, 493, 795, 503], \"label\": \"Text\"}, {\"bbox_2d\": [91, 660, 827, 670], \"label\": \"Text\"}, {\"bbox_2d\": [91, 112, 507, 136], \"label\": \"Text\"}, {\"bbox_2d\": [91, 317, 335, 327], \"label\": \"Section-header\"}, {\"bbox_2d\": [584, 1000, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [90, 676, 946, 960], \"label\": \"Table\"}, {\"bbox_2d\": [91, 509, 947, 646], \"label\": \"Table\"}, {\"bbox_2d\": [90, 143, 511, 294], \"label\": \"Table\"}, {\"bbox_2d\": [90, 335, 508, 422], \"label\": \"Table\"}, {\"bbox_2d\": [527, 112, 948, 249], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [496, 110, 910, 611], \"label\": \"Text\"}, {\"bbox_2d\": [52, 111, 459, 388], \"label\": \"Text\"}, {\"bbox_2d\": [496, 614, 910, 963], \"label\": \"Text\"}, {\"bbox_2d\": [496, 974, 496, 989], \"label\": \"Page-footer\"}, {\"bbox_2d\": [52, 799, 462, 960], \"label\": \"Text\"}, {\"bbox_2d\": [49, 102, 460, 799], \"label\": \"Text\"}, {\"bbox_2d\": [52, 54, 330, 101], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [52, 52, 345, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [495, 108, 921, 610], \"label\": \"Text\"}, {\"bbox_2d\": [48, 110, 470, 386], \"label\": \"Text\"}, {\"bbox_2d\": [48, 387, 474, 798], \"label\": \"Text\"}, {\"bbox_2d\": [498, 610, 925, 959], \"label\": \"Text\"}, {\"bbox_2d\": [50, 800, 468, 956], \"label\": \"Text\"}, {\"bbox_2d\": [465, 968, 504, 985], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 918, 152], \"label\": \"Text\"}, {\"bbox_2d\": [231, 331, 614, 342], \"label\": \"Text\"}, {\"bbox_2d\": [231, 637, 926, 649], \"label\": \"Text\"}, {\"bbox_2d\": [231, 672, 886, 744], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 757, 481, 767], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 603, 500, 614], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 297, 484, 307], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 173, 921, 293], \"label\": \"Picture\"}, {\"bbox_2d\": [229, 363, 560, 600], \"label\": \"Picture\"}, {\"bbox_2d\": [109, 987, 692, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 920, 152], \"label\": \"Text\"}, {\"bbox_2d\": [231, 331, 615, 343], \"label\": \"Text\"}, {\"bbox_2d\": [231, 637, 923, 649], \"label\": \"Text\"}, {\"bbox_2d\": [228, 667, 885, 752], \"label\": \"Picture\"}, {\"bbox_2d\": [232, 361, 561, 598], \"label\": \"Picture\"}, {\"bbox_2d\": [227, 171, 923, 291], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 297, 483, 308], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 603, 498, 614], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 755, 481, 766], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [46, 35, 156, 46], \"label\": \"Page-header\"}, {\"bbox_2d\": [95, 120, 729, 210], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 883, 959, 892], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 900, 987, 975], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 982, 995, 999], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 999, 994, 1008], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 1009, 995, 1018], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 1019, 995, 1028], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 1029, 995, 1038], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 1039, 994, 1048], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 1049, 994, 1058], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 1059, 995, 1068], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 1069, 995, 1078], \"label\": \"Section-header\"}, {\"bbox_2d\": [949, 332, 995, 976], \"label\": \"Text\"}, {\"bbox_2d\": [949, 977, 995, 986], \"label\": \"Text\"}, {\"bbox_2d\": [949, 987, 994, 996], \"label\": \"Text\"}, {\"bbox_2d\": [949, 997, 995, 1006], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1007, 994, 1016], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1017, 995, 1026], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1027, 995, 1036], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1037, 994, 1045], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1046, 994, 1055], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1056, 994, 1065], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1066, 995, 1075], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1076, 994, 1085], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1086, 994, 1095], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1096, 994, 1105], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1106, 995, 1115], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1116, 994, 1125], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1126, 995, 1135], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1136, 994, 1145], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1146, 995, 1155], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1156, 994, 1165], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1166, 994, 1175], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1176, 994, 1185], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1186, 994, 1195], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1196, 994, 1205], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1206, 995, 1215], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1216, 994, 1225], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1226, 995, 1235], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1236, 994, 1245], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1246, 994, 1255], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1256, 994, 1265], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1266, 994, 1275], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1276, 994, 1285], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1286, 994, 1295], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1296, 994, 1305], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1306, 994, 1315], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1316, 994, 1325], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1326, 994, 1335], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1336, 994, 1345], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1346, 994, 1355], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1356, 994, 1365], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1366, 994, 1375], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1376, 994, 1385], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1386, 994, 1395], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1396, 994, 1405], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1406, 994, 1415], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1416, 994, 1425], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1426, 994, 1435], \"label\": \"Text\"}, {\"bbox_2d\": [949,", "label": "```json[{\"bbox_2d\": [47, 35, 155, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [94, 118, 730, 217], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 235, 930, 292], \"label\": \"Text\"}, {\"bbox_2d\": [94, 331, 141, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 381, 142, 390], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 431, 139, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [200, 421, 255, 431], \"label\": \"Section-header\"}, {\"bbox_2d\": [306, 431, 361, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 491, 141, 501], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 531, 141, 541], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 602, 147, 611], \"label\": \"Section-header\"}, {\"bbox_2d\": [519, 461, 575, 471], \"label\": \"Section-header\"}, {\"bbox_2d\": [413, 431, 467, 441], \"label\": \"Section-header\"}, {\"bbox_2d\": [413, 672, 470, 682], \"label\": \"Section-header\"}, {\"bbox_2d\": [200, 732, 254, 742], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 782, 148, 792], \"label\": \"Section-header\"}, {\"bbox_2d\": [625, 823, 688, 832], \"label\": \"Section-header\"}, {\"bbox_2d\": [838, 883, 898, 892], \"label\": \"Section-header\"}, {\"bbox_2d\": [838, 894, 922, 973], \"label\": \"Text\"}, {\"bbox_2d\": [838, 332, 938, 873], \"label\": \"Text\"}, {\"bbox_2d\": [732, 332, 804, 973], \"label\": \"Text\"}, {\"bbox_2d\": [625, 834, 692, 973], \"label\": \"Text\"}, {\"bbox_2d\": [625, 332, 702, 813], \"label\": \"Text\"}, {\"bbox_2d\": [519, 332, 581, 451], \"label\": \"Text\"}, {\"bbox_2d\": [519, 472, 613, 973], \"label\": \"Text\"}, {\"bbox_2d\": [413, 683, 507, 973], \"label\": \"Text\"}, {\"bbox_2d\": [200, 743, 264, 973], \"label\": \"Text\"}, {\"bbox_2d\": [94, 793, 162, 973], \"label\": \"Text\"}, {\"bbox_2d\": [306, 442, 391, 973], \"label\": \"Text\"}, {\"bbox_2d\": [413, 442, 474, 662], \"label\": \"Text\"}, {\"bbox_2d\": [413, 332, 479, 421], \"label\": \"Text\"}, {\"bbox_2d\": [306, 332, 373, 421], \"label\": \"Text\"}, {\"bbox_2d\": [200, 332, 267, 411], \"label\": \"Text\"}, {\"bbox_2d\": [94, 342, 165, 371], \"label\": \"Text\"}, {\"bbox_2d\": [94, 392, 148, 421], \"label\": \"Text\"}, {\"bbox_2d\": [94, 442, 151, 481], \"label\": \"Text\"}, {\"bbox_2d\": [200, 432, 277, 722], \"label\": \"Text\"}, {\"bbox_2d\": [94, 613, 168, 772], \"label\": \"Text\"}, {\"bbox_2d\": [94, 543, 149, 592], \"label\": \"Text\"}, {\"bbox_2d\": [94, 502, 155, 521], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [869, 25, 1009, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 112, 603, 123], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 123, 676, 134], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 148, 277, 159], \"label\": \"Text\"}, {\"bbox_2d\": [25, 159, 278, 170], \"label\": \"Text\"}, {\"bbox_2d\": [25, 170, 260, 182], \"label\": \"Text\"}, {\"bbox_2d\": [25, 182, 258, 187], \"label\": \"Text\"}, {\"bbox_2d\": [25, 201, 900, 211], \"label\": \"Text\"}, {\"bbox_2d\": [25, 211, 252, 222], \"label\": \"Text\"}, {\"bbox_2d\": [25, 222, 573, 233], \"label\": \"Text\"}, {\"bbox_2d\": [25, 233, 573, 243], \"label\": \"Text\"}, {\"bbox_2d\": [25, 255, 277, 265], \"label\": \"Text\"}, {\"bbox_2d\": [25, 265, 507, 275], \"label\": \"Text\"}, {\"bbox_2d\": [25, 275, 284, 285], \"label\": \"Text\"}, {\"bbox_2d\": [25, 285, 284, 295], \"label\": \"Text\"}, {\"bbox_2d\": [25, 295, 511, 313], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 317, 1009, 347], \"label\": \"Text\"}, {\"bbox_2d\": [25, 361, 1009, 382], \"label\": \"Text\"}, {\"bbox_2d\": [25, 395, 1009, 425], \"label\": \"Text\"}, {\"bbox_2d\": [25, 438, 1009, 459], \"label\": \"Text\"}, {\"bbox_2d\": [25, 472, 1009, 502], \"label\": \"Text\"}, {\"bbox_2d\": [25, 515, 1009, 535], \"label\": \"Text\"}, {\"bbox_2d\": [25, 560, 1009, 579], \"label\": \"Text\"}, {\"bbox_2d\": [25, 593, 1009, 614], \"label\": \"Text\"}, {\"bbox_2d\": [25, 627, 1009, 667], \"label\": \"Text\"}, {\"bbox_2d\": [25, 548, 322, 559], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 732, 1009, 779], \"label\": \"Text\"}, {\"bbox_2d\": [26, 672, 1008, 712], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [870, 24, 1009, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 114, 599, 123], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 123, 672, 133], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 157, 278, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 167, 158, 177], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 177, 257, 187], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 210, 171, 220], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 220, 364, 230], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 230, 572, 240], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 263, 507, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 273, 284, 283], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 283, 131, 293], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 147, 275, 157], \"label\": \"Text\"}, {\"bbox_2d\": [25, 200, 900, 210], \"label\": \"Text\"}, {\"bbox_2d\": [25, 253, 275, 263], \"label\": \"Text\"}, {\"bbox_2d\": [25, 318, 1009, 347], \"label\": \"Text\"}, {\"bbox_2d\": [25, 361, 1009, 381], \"label\": \"Text\"}, {\"bbox_2d\": [25, 394, 1009, 424], \"label\": \"Text\"}, {\"bbox_2d\": [25, 438, 1009, 457], \"label\": \"Text\"}, {\"bbox_2d\": [25, 471, 1009, 501], \"label\": \"Text\"}, {\"bbox_2d\": [25, 514, 1009, 534], \"label\": \"Text\"}, {\"bbox_2d\": [25, 559, 1009, 579], \"label\": \"Text\"}, {\"bbox_2d\": [25, 592, 1009, 612], \"label\": \"Text\"}, {\"bbox_2d\": [25, 625, 1009, 665], \"label\": \"Text\"}, {\"bbox_2d\": [25, 548, 322, 558], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 306, 624, 317], \"label\": \"Section-header\"}, {\"bbox_2d\": [23, 672, 1010, 711], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [526, 37, 546, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 78, 875, 111], \"label\": \"Text\"}, {\"bbox_2d\": [115, 116, 853, 164], \"label\": \"Text\"}, {\"bbox_2d\": [115, 169, 849, 228], \"label\": \"Text\"}, {\"bbox_2d\": [115, 233, 851, 275], \"label\": \"Text\"}, {\"bbox_2d\": [115, 280, 845, 333], \"label\": \"Text\"}, {\"bbox_2d\": [115, 337, 844, 366], \"label\": \"Text\"}, {\"bbox_2d\": [115, 371, 844, 427], \"label\": \"Text\"}, {\"bbox_2d\": [115, 432, 851, 465], \"label\": \"Text\"}, {\"bbox_2d\": [115, 470, 851, 503], \"label\": \"Text\"}, {\"bbox_2d\": [115, 508, 849, 565], \"label\": \"Text\"}, {\"bbox_2d\": [115, 570, 849, 603], \"label\": \"Text\"}, {\"bbox_2d\": [115, 608, 851, 660], \"label\": \"Text\"}, {\"bbox_2d\": [115, 665, 851, 763], \"label\": \"Text\"}, {\"bbox_2d\": [115, 768, 849, 801], \"label\": \"Text\"}, {\"bbox_2d\": [115, 806, 849, 914], \"label\": \"Text\"}, {\"bbox_2d\": [115, 919, 851, 940], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [527, 40, 544, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 77, 957, 113], \"label\": \"Text\"}, {\"bbox_2d\": [161, 117, 936, 133], \"label\": \"Text\"}, {\"bbox_2d\": [116, 136, 958, 192], \"label\": \"Text\"}, {\"bbox_2d\": [116, 195, 959, 231], \"label\": \"Text\"}, {\"bbox_2d\": [116, 235, 958, 271], \"label\": \"Text\"}, {\"bbox_2d\": [116, 275, 959, 330], \"label\": \"Text\"}, {\"bbox_2d\": [117, 333, 959, 368], \"label\": \"Text\"}, {\"bbox_2d\": [117, 372, 959, 428], \"label\": \"Text\"}, {\"bbox_2d\": [117, 431, 959, 465], \"label\": \"Text\"}, {\"bbox_2d\": [117, 470, 959, 506], \"label\": \"Text\"}, {\"bbox_2d\": [117, 510, 959, 565], \"label\": \"Text\"}, {\"bbox_2d\": [117, 569, 959, 605], \"label\": \"Text\"}, {\"bbox_2d\": [117, 608, 959, 663], \"label\": \"Text\"}, {\"bbox_2d\": [117, 667, 959, 762], \"label\": \"Text\"}, {\"bbox_2d\": [117, 765, 959, 801], \"label\": \"Text\"}, {\"bbox_2d\": [117, 804, 960, 920], \"label\": \"Text\"}, {\"bbox_2d\": [163, 922, 960, 940], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [294, 84, 542, 97], \"label\": \"List-item\"}, {\"bbox_2d\": [360, 105, 926, 198], \"label\": \"Text\"}, {\"bbox_2d\": [360, 206, 926, 265], \"label\": \"Text\"}, {\"bbox_2d\": [360, 273, 915, 301], \"label\": \"Text\"}, {\"bbox_2d\": [360, 328, 584, 341], \"label\": \"Text\"}, {\"bbox_2d\": [294, 309, 552, 320], \"label\": \"List-item\"}, {\"bbox_2d\": [356, 350, 923, 376], \"label\": \"List-item\"}, {\"bbox_2d\": [356, 383, 904, 410], \"label\": \"List-item\"}, {\"bbox_2d\": [356, 418, 913, 462], \"label\": \"List-item\"}, {\"bbox_2d\": [356, 472, 842, 484], \"label\": \"Text\"}, {\"bbox_2d\": [356, 533, 923, 559], \"label\": \"List-item\"}, {\"bbox_2d\": [356, 567, 889, 594], \"label\": \"List-item\"}, {\"bbox_2d\": [356, 602, 888, 645], \"label\": \"List-item\"}, {\"bbox_2d\": [356, 657, 820, 669], \"label\": \"Text\"}, {\"bbox_2d\": [294, 676, 542, 688], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 833, 710, 861], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 869, 895, 896], \"label\": \"List-item\"}, {\"bbox_2d\": [356, 748, 908, 789], \"label\": \"Text\"}, {\"bbox_2d\": [356, 800, 887, 827], \"label\": \"Text\"}, {\"bbox_2d\": [356, 696, 901, 739], \"label\": \"Text\"}, {\"bbox_2d\": [356, 750, 908, 789], \"label\": \"Text\"}, {\"bbox_2d\": [356, 512, 609, 524], \"label\": \"Text\"}, {\"bbox_2d\": [294, 491, 542, 503], \"label\": \"List-item\"}, {\"bbox_2d\": [360, 598, 887, 645], \"label\": \"List-item\"}, {\"bbox_2d\": [600, 980, 931, 992], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 542, 98], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 308, 552, 321], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 492, 540, 505], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 676, 542, 689], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 833, 926, 862], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 868, 889, 897], \"label\": \"List-item\"}, {\"bbox_2d\": [360, 747, 903, 826], \"label\": \"Text\"}, {\"bbox_2d\": [360, 696, 896, 740], \"label\": \"Text\"}, {\"bbox_2d\": [360, 657, 819, 670], \"label\": \"Text\"}, {\"bbox_2d\": [360, 512, 607, 525], \"label\": \"Text\"}, {\"bbox_2d\": [360, 473, 842, 486], \"label\": \"Text\"}, {\"bbox_2d\": [360, 273, 910, 302], \"label\": \"Text\"}, {\"bbox_2d\": [360, 206, 920, 266], \"label\": \"Text\"}, {\"bbox_2d\": [360, 104, 921, 200], \"label\": \"Text\"}, {\"bbox_2d\": [360, 328, 607, 341], \"label\": \"Text\"}, {\"bbox_2d\": [360, 347, 919, 376], \"label\": \"List-item\"}, {\"bbox_2d\": [360, 382, 899, 411], \"label\": \"List-item\"}, {\"bbox_2d\": [360, 418, 881, 462], \"label\": \"List-item\"}, {\"bbox_2d\": [360, 531, 919, 560], \"label\": \"List-item\"}, {\"bbox_2d\": [360, 567, 887, 595], \"label\": \"List-item\"}, {\"bbox_2d\": [360, 602, 881, 646], \"label\": \"List-item\"}, {\"bbox_2d\": [601, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [547, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 45, 209, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [231, 193, 908, 222], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 323, 740, 335], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 420, 774, 433], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 897, 910, 926], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 987, 692, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 449, 483, 461], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 351, 510, 363], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 239, 572, 250], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 93, 595, 105], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 113, 646, 174], \"label\": \"Text\"}, {\"bbox_2d\": [255, 262, 764, 303], \"label\": \"Text\"}, {\"bbox_2d\": [255, 374, 662, 403], \"label\": \"Text\"}, {\"bbox_2d\": [255, 474, 778, 546], \"label\": \"Text\"}, {\"bbox_2d\": [255, 567, 916, 595], \"label\": \"Text\"}, {\"bbox_2d\": [255, 865, 457, 876], \"label\": \"Text\"}, {\"bbox_2d\": [273, 613, 570, 844], \"label\": \"Text\"}, {\"bbox_2d\": [255, 595, 695, 606], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [109, 45, 210, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [546, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [255, 93, 594, 104], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 450, 481, 461], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 194, 913, 221], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 323, 745, 335], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 421, 775, 433], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 896, 915, 924], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 115, 856, 174], \"label\": \"Text\"}, {\"bbox_2d\": [255, 238, 571, 249], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 261, 763, 304], \"label\": \"Text\"}, {\"bbox_2d\": [255, 352, 508, 363], \"label\": \"Caption\"}, {\"bbox_2d\": [255, 374, 662, 402], \"label\": \"Text\"}, {\"bbox_2d\": [255, 473, 924, 877], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 693, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 48, 295, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 127, 919, 192], \"label\": \"Text\"}, {\"bbox_2d\": [121, 192, 918, 258], \"label\": \"Text\"}, {\"bbox_2d\": [121, 259, 918, 342], \"label\": \"Text\"}, {\"bbox_2d\": [121, 342, 917, 374], \"label\": \"Text\"}, {\"bbox_2d\": [121, 374, 627, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 391, 727, 408], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 408, 749, 425], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 425, 917, 490], \"label\": \"Text\"}, {\"bbox_2d\": [121, 490, 917, 556], \"label\": \"Text\"}, {\"bbox_2d\": [121, 557, 917, 640], \"label\": \"Text\"}, {\"bbox_2d\": [147, 640, 648, 657], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 657, 917, 723], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 723, 613, 740], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 740, 917, 806], \"label\": \"Text\"}, {\"bbox_2d\": [456, 806, 615, 838], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 839, 344, 854], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 854, 917, 937], \"label\": \"Text\"}, {\"bbox_2d\": [121, 972, 444, 988], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 126, 938, 192], \"label\": \"Text\"}, {\"bbox_2d\": [121, 193, 966, 258], \"label\": \"Text\"}, {\"bbox_2d\": [121, 259, 955, 341], \"label\": \"Text\"}, {\"bbox_2d\": [121, 342, 962, 374], \"label\": \"Text\"}, {\"bbox_2d\": [182, 391, 742, 407], \"label\": \"Text\"}, {\"bbox_2d\": [121, 441, 955, 490], \"label\": \"Text\"}, {\"bbox_2d\": [121, 491, 950, 556], \"label\": \"Text\"}, {\"bbox_2d\": [121, 557, 954, 639], \"label\": \"Text\"}, {\"bbox_2d\": [182, 639, 653, 655], \"label\": \"Text\"}, {\"bbox_2d\": [121, 739, 964, 804], \"label\": \"Text\"}, {\"bbox_2d\": [121, 855, 962, 937], \"label\": \"Text\"}, {\"bbox_2d\": [121, 838, 342, 854], \"label\": \"Section-header\"}, {\"bbox_2d\": [426, 821, 612, 837], \"label\": \"Section-header\"}, {\"bbox_2d\": [456, 805, 584, 821], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 375, 644, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [145, 408, 768, 424], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 424, 892, 440], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 656, 947, 722], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 722, 624, 738], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 83, 661, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [274, 404, 583, 420], \"label\": \"Text\"}, {\"bbox_2d\": [274, 436, 764, 452], \"label\": \"Text\"}, {\"bbox_2d\": [274, 778, 627, 794], \"label\": \"Text\"}, {\"bbox_2d\": [99, 975, 379, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [294, 464, 825, 734], \"label\": \"Table\"}, {\"bbox_2d\": [272, 97, 911, 378], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [272, 86, 663, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [273, 407, 582, 420], \"label\": \"Text\"}, {\"bbox_2d\": [273, 439, 764, 452], \"label\": \"Text\"}, {\"bbox_2d\": [472, 463, 648, 475], \"label\": \"Section-header\"}, {\"bbox_2d\": [297, 486, 824, 733], \"label\": \"Text\"}, {\"bbox_2d\": [273, 782, 626, 795], \"label\": \"Text\"}, {\"bbox_2d\": [272, 98, 911, 379], \"label\": \"Table\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 205, 963, 797], \"label\": \"Table\"}, {\"bbox_2d\": [121, 801, 381, 814], \"label\": \"Text\"}, {\"bbox_2d\": [814, 1001, 986, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [1011, 770, 1028, 863], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [122, 209, 964, 794], \"label\": \"Table\"}, {\"bbox_2d\": [123, 804, 379, 815], \"label\": \"Text\"}, {\"bbox_2d\": [1014, 771, 1024, 861], \"label\": \"Page-header\"}, {\"bbox_2d\": [813, 1002, 985, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [198, 69, 934, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [198, 200, 372, 213], \"label\": \"Section-header\"}, {\"bbox_2d\": [198, 709, 522, 722], \"label\": \"Section-header\"}, {\"bbox_2d\": [198, 768, 365, 781], \"label\": \"Section-header\"}, {\"bbox_2d\": [198, 853, 429, 866], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 946, 18, 959], \"label\": \"Page-footer\"}, {\"bbox_2d\": [35, 958, 147, 1019], \"label\": \"Picture\"}, {\"bbox_2d\": [198, 555, 951, 616], \"label\": \"Text\"}, {\"bbox_2d\": [229, 638, 944, 930], \"label\": \"Table\"}, {\"bbox_2d\": [228, 285, 943, 531], \"label\": \"Table\"}, {\"bbox_2d\": [237, 234, 943, 264], \"label\": \"Table\"}, {\"bbox_2d\": [238, 220, 446, 262], \"label\": \"Text\"}, {\"bbox_2d\": [451, 139, 922, 190], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [198, 69, 937, 100], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 138, 945, 267], \"label\": \"Table\"}, {\"bbox_2d\": [199, 554, 952, 617], \"label\": \"List-item\"}, {\"bbox_2d\": [197, 639, 945, 930], \"label\": \"Table\"}, {\"bbox_2d\": [113, 948, 133, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [44, 960, 145, 1020], \"label\": \"Picture\"}, {\"bbox_2d\": [239, 286, 943, 533], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 546, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 113, 909, 140], \"label\": \"Text\"}, {\"bbox_2d\": [231, 470, 926, 513], \"label\": \"Text\"}, {\"bbox_2d\": [231, 531, 834, 542], \"label\": \"Text\"}, {\"bbox_2d\": [231, 666, 926, 716], \"label\": \"Text\"}, {\"bbox_2d\": [231, 757, 926, 879], \"label\": \"Text\"}, {\"bbox_2d\": [231, 737, 708, 752], \"label\": \"Section-header\"}, {\"bbox_2d\": [461, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 152, 915, 179], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 189, 501, 200], \"label\": \"Text\"}, {\"bbox_2d\": [255, 211, 890, 261], \"label\": \"Text\"}, {\"bbox_2d\": [231, 273, 452, 283], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 294, 918, 321], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 332, 763, 406], \"label\": \"Text\"}, {\"bbox_2d\": [231, 417, 620, 428], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 451, 674, 465], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 551, 926, 611], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 622, 926, 649], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 546, 107], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 450, 675, 464], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 738, 708, 752], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 113, 908, 140], \"label\": \"Text\"}, {\"bbox_2d\": [255, 189, 509, 200], \"label\": \"Text\"}, {\"bbox_2d\": [255, 211, 889, 223], \"label\": \"Text\"}, {\"bbox_2d\": [255, 233, 898, 260], \"label\": \"Text\"}, {\"bbox_2d\": [231, 466, 926, 513], \"label\": \"Text\"}, {\"bbox_2d\": [231, 530, 834, 542], \"label\": \"Text\"}, {\"bbox_2d\": [231, 666, 926, 694], \"label\": \"Text\"}, {\"bbox_2d\": [231, 704, 679, 716], \"label\": \"Text\"}, {\"bbox_2d\": [231, 758, 926, 879], \"label\": \"Text\"}, {\"bbox_2d\": [461, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 151, 914, 178], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 271, 451, 283], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 293, 919, 321], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 416, 621, 428], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 552, 926, 611], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 621, 926, 649], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 331, 763, 406], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [292, 83, 695, 98], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 96, 935, 506], \"label\": \"Table\"}, {\"bbox_2d\": [292, 569, 619, 582], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 747, 924, 778], \"label\": \"Text\"}, {\"bbox_2d\": [292, 782, 837, 797], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 801, 408, 816], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 717, 383, 739], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 584, 935, 686], \"label\": \"Table\"}, {\"bbox_2d\": [121, 535, 223, 556], \"label\": \"Section-header\"}, {\"bbox_2d\": [629, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [292, 833, 690, 848], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [293, 86, 697, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 582, 932, 688], \"label\": \"Table\"}, {\"bbox_2d\": [292, 98, 932, 506], \"label\": \"Table\"}, {\"bbox_2d\": [293, 571, 620, 581], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 539, 222, 556], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 720, 383, 737], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 749, 923, 778], \"label\": \"Text\"}, {\"bbox_2d\": [293, 835, 690, 849], \"label\": \"Text\"}, {\"bbox_2d\": [293, 782, 837, 798], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 804, 408, 817], \"label\": \"List-item\"}, {\"bbox_2d\": [630, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [814, 1003, 1003, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [52, 962, 953, 1001], \"label\": \"Text\"}, {\"bbox_2d\": [282, 671, 962, 697], \"label\": \"List-item\"}, {\"bbox_2d\": [282, 703, 968, 742], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 632, 646, 645], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 651, 602, 664], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 847, 974, 950], \"label\": \"Text\"}, {\"bbox_2d\": [255, 748, 968, 787], \"label\": \"Text\"}, {\"bbox_2d\": [255, 793, 947, 833], \"label\": \"Text\"}, {\"bbox_2d\": [255, 547, 975, 625], \"label\": \"Text\"}, {\"bbox_2d\": [255, 462, 975, 540], \"label\": \"Text\"}, {\"bbox_2d\": [255, 410, 965, 450], \"label\": \"Text\"}, {\"bbox_2d\": [255, 333, 965, 400], \"label\": \"Text\"}, {\"bbox_2d\": [255, 271, 978, 321], \"label\": \"Text\"}, {\"bbox_2d\": [255, 208, 973, 260], \"label\": \"Text\"}, {\"bbox_2d\": [255, 157, 962, 195], \"label\": \"Text\"}, {\"bbox_2d\": [79, 157, 204, 183], \"label\": \"Text\"}, {\"bbox_2d\": [79, 208, 231, 247], \"label\": \"Text\"}, {\"bbox_2d\": [79, 271, 222, 283], \"label\": \"Text\"}, {\"bbox_2d\": [79, 333, 206, 346], \"label\": \"Text\"}, {\"bbox_2d\": [79, 410, 230, 424], \"label\": \"Text\"}, {\"bbox_2d\": [79, 462, 229, 489], \"label\": \"Text\"}, {\"bbox_2d\": [52, 138, 613, 150], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [53, 963, 958, 1001], \"label\": \"Text\"}, {\"bbox_2d\": [867, 1003, 1004, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [53, 154, 981, 959], \"label\": \"Table\"}, {\"bbox_2d\": [53, 139, 615, 150], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 918, 136], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 690, 871, 722], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 731, 451, 744], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 775, 658, 788], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 820, 779, 855], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 864, 545, 877], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 621, 485, 637], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 655, 917, 683], \"label\": \"Text\"}, {\"bbox_2d\": [255, 755, 601, 767], \"label\": \"Text\"}, {\"bbox_2d\": [255, 798, 601, 810], \"label\": \"Text\"}, {\"bbox_2d\": [255, 841, 601, 853], \"label\": \"Text\"}, {\"bbox_2d\": [255, 885, 601, 914], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 698, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [255, 154, 927, 580], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 584, 484, 595], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 918, 136], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 693, 871, 721], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 731, 461, 743], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 776, 669, 788], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 820, 789, 832], \"label\": \"List-item\"}, {\"bbox_2d\": [257, 865, 556, 877], \"label\": \"List-item\"}, {\"bbox_2d\": [280, 887, 601, 914], \"label\": \"Text\"}, {\"bbox_2d\": [280, 843, 601, 854], \"label\": \"Text\"}, {\"bbox_2d\": [280, 798, 601, 810], \"label\": \"Text\"}, {\"bbox_2d\": [280, 754, 601, 765], \"label\": \"Text\"}, {\"bbox_2d\": [231, 655, 921, 683], \"label\": \"Text\"}, {\"bbox_2d\": [255, 155, 926, 579], \"label\": \"Picture\"}, {\"bbox_2d\": [255, 584, 483, 594], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 621, 485, 637], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [85, 44, 931, 105], \"label\": \"Text\"}, {\"bbox_2d\": [85, 118, 967, 231], \"label\": \"Table\"}, {\"bbox_2d\": [85, 243, 931, 274], \"label\": \"Text\"}, {\"bbox_2d\": [85, 286, 954, 381], \"label\": \"Text\"}, {\"bbox_2d\": [85, 409, 945, 474], \"label\": \"Text\"}, {\"bbox_2d\": [85, 501, 945, 547], \"label\": \"Text\"}, {\"bbox_2d\": [85, 560, 965, 591], \"label\": \"Text\"}, {\"bbox_2d\": [85, 620, 925, 651], \"label\": \"Text\"}, {\"bbox_2d\": [85, 663, 363, 677], \"label\": \"Text\"}, {\"bbox_2d\": [85, 680, 963, 789], \"label\": \"Text\"}, {\"bbox_2d\": [85, 803, 960, 834], \"label\": \"Text\"}, {\"bbox_2d\": [85, 998, 99, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 604, 218, 620], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 395, 253, 408], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 486, 474, 500], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [85, 42, 938, 105], \"label\": \"Text\"}, {\"bbox_2d\": [85, 241, 935, 273], \"label\": \"Text\"}, {\"bbox_2d\": [85, 285, 959, 380], \"label\": \"Text\"}, {\"bbox_2d\": [85, 408, 949, 471], \"label\": \"Text\"}, {\"bbox_2d\": [85, 499, 950, 546], \"label\": \"Text\"}, {\"bbox_2d\": [85, 559, 969, 590], \"label\": \"Text\"}, {\"bbox_2d\": [85, 619, 931, 650], \"label\": \"Text\"}, {\"bbox_2d\": [85, 678, 969, 789], \"label\": \"Text\"}, {\"bbox_2d\": [85, 801, 964, 832], \"label\": \"Text\"}, {\"bbox_2d\": [85, 997, 100, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [85, 392, 254, 408], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 484, 478, 499], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 600, 218, 619], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 662, 367, 677], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 115, 968, 228], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [31, 359, 41, 677], \"label\": \"Page-header\"}, {\"bbox_2d\": [507, 993, 525, 1007], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 359, 44, 676], \"label\": \"Page-header\"}, {\"bbox_2d\": [506, 990, 526, 1011], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [679, 20, 936, 43], \"label\": \"Page-header\"}, {\"bbox_2d\": [98, 87, 936, 160], \"label\": \"Text\"}, {\"bbox_2d\": [98, 194, 172, 213], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 321, 302, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 627, 325, 646], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 357, 936, 412], \"label\": \"Text\"}, {\"bbox_2d\": [98, 664, 933, 718], \"label\": \"Text\"}, {\"bbox_2d\": [98, 736, 936, 755], \"label\": \"Text\"}, {\"bbox_2d\": [98, 984, 182, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [923, 966, 953, 1010], \"label\": \"Page-footer\"}, {\"bbox_2d\": [84, 770, 937, 838], \"label\": \"Picture\"}, {\"bbox_2d\": [473, 462, 746, 524], \"label\": \"Caption\"}, {\"bbox_2d\": [325, 445, 451, 541], \"label\": \"Picture\"}, {\"bbox_2d\": [325, 189, 756, 282], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [679, 22, 942, 43], \"label\": \"Page-header\"}, {\"bbox_2d\": [98, 86, 948, 160], \"label\": \"Text\"}, {\"bbox_2d\": [98, 194, 175, 214], \"label\": \"Text\"}, {\"bbox_2d\": [98, 357, 943, 412], \"label\": \"Text\"}, {\"bbox_2d\": [98, 663, 939, 719], \"label\": \"Text\"}, {\"bbox_2d\": [98, 735, 941, 755], \"label\": \"Text\"}, {\"bbox_2d\": [98, 320, 307, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 627, 330, 647], \"label\": \"Section-header\"}, {\"bbox_2d\": [326, 191, 450, 284], \"label\": \"Picture\"}, {\"bbox_2d\": [324, 447, 452, 541], \"label\": \"Picture\"}, {\"bbox_2d\": [475, 462, 750, 521], \"label\": \"Caption\"}, {\"bbox_2d\": [472, 207, 742, 266], \"label\": \"Caption\"}, {\"bbox_2d\": [913, 965, 959, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [84, 985, 187, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [206, 774, 932, 828], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [675, 71, 814, 84], \"label\": \"Section-header\"}, {\"bbox_2d\": [540, 179, 951, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [210, 509, 386, 522], \"label\": \"Section-header\"}, {\"bbox_2d\": [611, 789, 822, 802], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 71, 223, 84], \"label\": \"Text\"}, {\"bbox_2d\": [91, 183, 506, 210], \"label\": \"Text\"}, {\"bbox_2d\": [91, 272, 130, 285], \"label\": \"Text\"}, {\"bbox_2d\": [91, 336, 167, 349], \"label\": \"Text\"}, {\"bbox_2d\": [91, 400, 506, 478], \"label\": \"Text\"}, {\"bbox_2d\": [91, 541, 506, 949], \"label\": \"Text\"}, {\"bbox_2d\": [536, 227, 951, 343], \"label\": \"Text\"}, {\"bbox_2d\": [536, 444, 951, 491], \"label\": \"Text\"}, {\"bbox_2d\": [536, 581, 951, 608], \"label\": \"Text\"}, {\"bbox_2d\": [536, 664, 935, 691], \"label\": \"Text\"}, {\"bbox_2d\": [536, 744, 912, 757], \"label\": \"Text\"}, {\"bbox_2d\": [536, 820, 951, 877], \"label\": \"Text\"}, {\"bbox_2d\": [536, 890, 951, 949], \"label\": \"Text\"}, {\"bbox_2d\": [568, 833, 951, 853], \"label\": \"Formula\"}, {\"bbox_2d\": [568, 862, 814, 877], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 697, 951, 734], \"label\": \"Formula\"}, {\"bbox_2d\": [580, 618, 951, 654], \"label\": \"Formula\"}, {\"bbox_2d\": [675, 505, 951, 530], \"label\": \"Formula\"}, {\"bbox_2d\": [675, 408, 951, 423], \"label\": \"Formula\"}, {\"bbox_2d\": [675, 357, 951, 372], \"label\": \"Formula\"}, {\"bbox_2d\": [205, 241, 506, 256], \"label\": \"Formula\"}, {\"bbox_2d\": [184, 295, 506, 321], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 366, 506, 387], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 394, 506, 414], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 391, 506, 414], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 382, 562, 395], \"label\": \"Text\"}, {\"bbox_2d\": [536, 444, 951, 491], \"label\": \"Text\"}, {\"bbox_2d\": [536, 105, 951, 148], \"label\": \"Text\"}, {\"bbox_2d\": [157, 97, 506, 163], \"label\": \"Formula\"}, {\"bbox_2d\": [157, 166, 506, 188], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [91, 71, 222, 84], \"label\": \"Text\"}, {\"bbox_2d\": [91, 181, 506, 211], \"label\": \"Text\"}, {\"bbox_2d\": [91, 272, 129, 285], \"label\": \"Text\"}, {\"bbox_2d\": [91, 335, 197, 348], \"label\": \"Text\"}, {\"bbox_2d\": [91, 402, 506, 475], \"label\": \"Text\"}, {\"bbox_2d\": [91, 542, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 889, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 662, 950, 690], \"label\": \"Text\"}, {\"bbox_2d\": [536, 744, 913, 757], \"label\": \"Text\"}, {\"bbox_2d\": [536, 579, 951, 607], \"label\": \"Text\"}, {\"bbox_2d\": [536, 440, 951, 492], \"label\": \"Text\"}, {\"bbox_2d\": [536, 382, 563, 395], \"label\": \"Text\"}, {\"bbox_2d\": [536, 225, 951, 343], \"label\": \"Text\"}, {\"bbox_2d\": [536, 104, 952, 147], \"label\": \"Text\"}, {\"bbox_2d\": [673, 72, 814, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [539, 179, 948, 203], \"label\": \"Section-header\"}, {\"bbox_2d\": [611, 788, 877, 800], \"label\": \"Section-header\"}, {\"bbox_2d\": [211, 510, 386, 520], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 821, 950, 849], \"label\": \"Text\"}, {\"bbox_2d\": [676, 862, 951, 875], \"label\": \"Formula\"}, {\"bbox_2d\": [655, 702, 951, 733], \"label\": \"Formula\"}, {\"bbox_2d\": [578, 619, 951, 658], \"label\": \"Formula\"}, {\"bbox_2d\": [673, 503, 951, 534], \"label\": \"Formula\"}, {\"bbox_2d\": [671, 537, 951, 568], \"label\": \"Formula\"}, {\"bbox_2d\": [676, 356, 951, 369], \"label\": \"Formula\"}, {\"bbox_2d\": [691, 400, 951, 431], \"label\": \"Formula\"}, {\"bbox_2d\": [156, 98, 506, 138], \"label\": \"Formula\"}, {\"bbox_2d\": [158, 134, 506, 175], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 240, 506, 265], \"label\": \"Formula\"}, {\"bbox_2d\": [187, 292, 506, 323], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 359, 506, 390], \"label\": \"Formula\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 27, 328, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 741, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 123, 250, 132], \"label\": \"Text\"}, {\"bbox_2d\": [91, 607, 493, 616], \"label\": \"Text\"}, {\"bbox_2d\": [88, 169, 947, 595], \"label\": \"Table\"}, {\"bbox_2d\": [91, 999, 445, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 327, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 96, 741, 116], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 123, 250, 132], \"label\": \"Text\"}, {\"bbox_2d\": [91, 608, 493, 616], \"label\": \"Text\"}, {\"bbox_2d\": [91, 1000, 444, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [89, 171, 949, 594], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [126, 222, 804, 310], \"label\": \"Section-header\"}, {\"bbox_2d\": [143, 108, 375, 167], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [142, 110, 372, 169], \"label\": \"Picture\"}, {\"bbox_2d\": [125, 221, 818, 311], \"label\": \"Title\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [51, 542, 65, 564], \"label\": \"Page-header\"}, {\"bbox_2d\": [934, 54, 951, 248], \"label\": \"Page-header\"}, {\"bbox_2d\": [75, 90, 932, 904], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [35, 62, 988, 905], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [777, 14, 998, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 99, 541, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 420, 450, 434], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 795, 516, 809], \"label\": \"Section-header\"}, {\"bbox_2d\": [862, 998, 990, 1016], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 881, 947, 890], \"label\": \"Text\"}, {\"bbox_2d\": [121, 891, 947, 923], \"label\": \"Text\"}, {\"bbox_2d\": [121, 715, 958, 738], \"label\": \"Text\"}, {\"bbox_2d\": [121, 735, 582, 746], \"label\": \"Text\"}, {\"bbox_2d\": [121, 365, 616, 376], \"label\": \"Text\"}, {\"bbox_2d\": [119, 120, 965, 365], \"label\": \"Table\"}, {\"bbox_2d\": [117, 440, 964, 715], \"label\": \"Table\"}, {\"bbox_2d\": [119, 813, 965, 878], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [778, 18, 998, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [122, 102, 541, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 424, 450, 434], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 441, 964, 713], \"label\": \"Table\"}, {\"bbox_2d\": [120, 816, 964, 878], \"label\": \"Table\"}, {\"bbox_2d\": [120, 882, 802, 891], \"label\": \"Text\"}, {\"bbox_2d\": [159, 893, 949, 922], \"label\": \"Text\"}, {\"bbox_2d\": [120, 716, 961, 735], \"label\": \"Text\"}, {\"bbox_2d\": [159, 737, 584, 745], \"label\": \"Text\"}, {\"bbox_2d\": [122, 798, 516, 808], \"label\": \"Section-header\"}, {\"bbox_2d\": [863, 1001, 992, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [117, 120, 963, 367], \"label\": \"Table\"}, {\"bbox_2d\": [120, 369, 616, 377], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [89, 60, 787, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [89, 122, 955, 909], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [88, 60, 790, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 122, 953, 908], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 549, 291, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [499, 940, 536, 955], \"label\": \"Page-footer\"}, {\"bbox_2d\": [156, 91, 906, 253], \"label\": \"Table\"}, {\"bbox_2d\": [157, 266, 665, 534], \"label\": \"Table\"}, {\"bbox_2d\": [153, 577, 915, 882], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [159, 94, 906, 254], \"label\": \"Table\"}, {\"bbox_2d\": [158, 269, 663, 532], \"label\": \"Table\"}, {\"bbox_2d\": [151, 579, 914, 881], \"label\": \"Table\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 547, 294, 564], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 32, 409, 88], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 139, 911, 196], \"label\": \"Text\"}, {\"bbox_2d\": [121, 323, 911, 397], \"label\": \"Text\"}, {\"bbox_2d\": [121, 417, 911, 474], \"label\": \"Text\"}, {\"bbox_2d\": [121, 617, 911, 675], \"label\": \"Text\"}, {\"bbox_2d\": [121, 890, 683, 904], \"label\": \"Text\"}, {\"bbox_2d\": [121, 922, 857, 936], \"label\": \"Text\"}, {\"bbox_2d\": [156, 215, 911, 272], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 275, 911, 303], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 496, 911, 599], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 694, 911, 769], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 772, 911, 871], \"label\": \"List-item\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [123, 136, 918, 196], \"label\": \"Text\"}, {\"bbox_2d\": [123, 322, 918, 397], \"label\": \"Text\"}, {\"bbox_2d\": [123, 414, 918, 474], \"label\": \"Text\"}, {\"bbox_2d\": [123, 615, 918, 675], \"label\": \"Text\"}, {\"bbox_2d\": [123, 889, 684, 904], \"label\": \"Text\"}, {\"bbox_2d\": [123, 921, 864, 936], \"label\": \"Text\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [133, 692, 918, 767], \"label\": \"List-item\"}, {\"bbox_2d\": [128, 767, 918, 871], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 493, 918, 598], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 213, 918, 273], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 273, 918, 303], \"label\": \"List-item\"}, {\"bbox_2d\": [122, 34, 410, 86], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [199, 65, 465, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [199, 137, 888, 237], \"label\": \"Text\"}, {\"bbox_2d\": [199, 265, 888, 386], \"label\": \"Text\"}, {\"bbox_2d\": [199, 414, 888, 514], \"label\": \"Text\"}, {\"bbox_2d\": [199, 541, 888, 639], \"label\": \"Text\"}, {\"bbox_2d\": [199, 667, 888, 830], \"label\": \"Text\"}, {\"bbox_2d\": [199, 858, 888, 958], \"label\": \"Text\"}, {\"bbox_2d\": [112, 944, 123, 955], \"label\": \"Page-header\"}, {\"bbox_2d\": [39, 958, 145, 1018], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [113, 948, 121, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [40, 959, 149, 1021], \"label\": \"Picture\"}, {\"bbox_2d\": [199, 415, 887, 512], \"label\": \"Text\"}, {\"bbox_2d\": [199, 542, 887, 639], \"label\": \"Text\"}, {\"bbox_2d\": [199, 669, 887, 829], \"label\": \"Text\"}, {\"bbox_2d\": [199, 860, 887, 956], \"label\": \"Text\"}, {\"bbox_2d\": [199, 267, 887, 385], \"label\": \"Text\"}, {\"bbox_2d\": [199, 140, 887, 237], \"label\": \"Text\"}, {\"bbox_2d\": [198, 69, 468, 83], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [533, 111, 944, 310], \"label\": \"Text\"}, {\"bbox_2d\": [533, 323, 944, 414], \"label\": \"Text\"}, {\"bbox_2d\": [533, 426, 944, 610], \"label\": \"Text\"}, {\"bbox_2d\": [533, 622, 944, 743], \"label\": \"Text\"}, {\"bbox_2d\": [82, 628, 500, 890], \"label\": \"Text\"}, {\"bbox_2d\": [533, 756, 944, 813], \"label\": \"Text\"}, {\"bbox_2d\": [821, 1004, 954, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [81, 108, 346, 123], \"label\": \"Section-header\"}, {\"bbox_2d\": [81, 126, 478, 155], \"label\": \"Text\"}, {\"bbox_2d\": [81, 386, 473, 412], \"label\": \"Text\"}, {\"bbox_2d\": [81, 424, 498, 530], \"label\": \"Text\"}, {\"bbox_2d\": [81, 542, 500, 616], \"label\": \"Text\"}, {\"bbox_2d\": [81, 628, 500, 890], \"label\": \"Text\"}, {\"bbox_2d\": [81, 162, 503, 365], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [533, 112, 954, 310], \"label\": \"Text\"}, {\"bbox_2d\": [81, 127, 482, 153], \"label\": \"Text\"}, {\"bbox_2d\": [81, 110, 345, 123], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 324, 951, 412], \"label\": \"Text\"}, {\"bbox_2d\": [81, 385, 475, 411], \"label\": \"Text\"}, {\"bbox_2d\": [81, 424, 499, 529], \"label\": \"Text\"}, {\"bbox_2d\": [81, 542, 501, 615], \"label\": \"Text\"}, {\"bbox_2d\": [533, 426, 952, 609], \"label\": \"Text\"}, {\"bbox_2d\": [533, 755, 940, 813], \"label\": \"Text\"}, {\"bbox_2d\": [533, 622, 950, 742], \"label\": \"Text\"}, {\"bbox_2d\": [81, 628, 499, 890], \"label\": \"Text\"}, {\"bbox_2d\": [822, 1005, 955, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [79, 164, 502, 364], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [62, 64, 267, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [103, 103, 429, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 134, 196, 149], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 157, 478, 323], \"label\": \"Text\"}, {\"bbox_2d\": [62, 326, 480, 677], \"label\": \"Text\"}, {\"bbox_2d\": [62, 680, 481, 772], \"label\": \"Text\"}, {\"bbox_2d\": [62, 924, 482, 960], \"label\": \"Text\"}, {\"bbox_2d\": [513, 103, 926, 233], \"label\": \"Text\"}, {\"bbox_2d\": [513, 236, 926, 709], \"label\": \"Text\"}, {\"bbox_2d\": [513, 713, 926, 812], \"label\": \"Text\"}, {\"bbox_2d\": [573, 827, 766, 865], \"label\": \"Formula\"}, {\"bbox_2d\": [573, 906, 874, 943], \"label\": \"Formula\"}, {\"bbox_2d\": [235, 789, 312, 825], \"label\": \"Formula\"}, {\"bbox_2d\": [72, 841, 125, 856], \"label\": \"Text\"}, {\"bbox_2d\": [109, 866, 439, 916], \"label\": \"Text\"}, {\"bbox_2d\": [513, 885, 572, 899], \"label\": \"Text\"}, {\"bbox_2d\": [492, 988, 512, 1001], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [63, 62, 290, 93], \"label\": \"Page-header\"}, {\"bbox_2d\": [61, 107, 425, 150], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 101, 936, 231], \"label\": \"Text\"}, {\"bbox_2d\": [61, 156, 485, 323], \"label\": \"Text\"}, {\"bbox_2d\": [514, 233, 941, 709], \"label\": \"Text\"}, {\"bbox_2d\": [63, 324, 494, 679], \"label\": \"Text\"}, {\"bbox_2d\": [518, 711, 937, 810], \"label\": \"Text\"}, {\"bbox_2d\": [68, 680, 486, 772], \"label\": \"Text\"}, {\"bbox_2d\": [235, 795, 312, 826], \"label\": \"Formula\"}, {\"bbox_2d\": [682, 829, 758, 855], \"label\": \"Formula\"}, {\"bbox_2d\": [574, 903, 876, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [576, 922, 783, 939], \"label\": \"List-item\"}, {\"bbox_2d\": [585, 940, 801, 958], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 862, 331, 880], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 881, 446, 899], \"label\": \"List-item\"}, {\"bbox_2d\": [114, 900, 328, 917], \"label\": \"List-item\"}, {\"bbox_2d\": [519, 882, 581, 900], \"label\": \"Text\"}, {\"bbox_2d\": [71, 840, 132, 857], \"label\": \"Text\"}, {\"bbox_2d\": [71, 923, 490, 960], \"label\": \"Text\"}, {\"bbox_2d\": [491, 984, 514, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [814, 33, 989, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 73, 511, 170], \"label\": \"Text\"}, {\"bbox_2d\": [109, 174, 511, 305], \"label\": \"Text\"}, {\"bbox_2d\": [109, 308, 511, 423], \"label\": \"Text\"}, {\"bbox_2d\": [109, 426, 511, 556], \"label\": \"Text\"}, {\"bbox_2d\": [109, 559, 511, 707], \"label\": \"Text\"}, {\"bbox_2d\": [109, 743, 511, 858], \"label\": \"Text\"}, {\"bbox_2d\": [109, 862, 511, 974], \"label\": \"Text\"}, {\"bbox_2d\": [542, 712, 941, 842], \"label\": \"Text\"}, {\"bbox_2d\": [542, 845, 941, 924], \"label\": \"Text\"}, {\"bbox_2d\": [542, 927, 941, 974], \"label\": \"Text\"}, {\"bbox_2d\": [542, 75, 941, 106], \"label\": \"Text\"}, {\"bbox_2d\": [542, 109, 941, 203], \"label\": \"Text\"}, {\"bbox_2d\": [542, 206, 941, 337], \"label\": \"Text\"}, {\"bbox_2d\": [542, 340, 941, 372], \"label\": \"Text\"}, {\"bbox_2d\": [542, 595, 951, 623], \"label\": \"Caption\"}, {\"bbox_2d\": [542, 644, 941, 707], \"label\": \"Text\"}, {\"bbox_2d\": [109, 724, 240, 741], \"label\": \"Section-header\"}, {\"bbox_2d\": [538, 390, 964, 582], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [978, 35, 988, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [814, 35, 945, 45], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 73, 514, 170], \"label\": \"Text\"}, {\"bbox_2d\": [109, 173, 516, 304], \"label\": \"Text\"}, {\"bbox_2d\": [109, 307, 514, 421], \"label\": \"Text\"}, {\"bbox_2d\": [109, 424, 512, 555], \"label\": \"Text\"}, {\"bbox_2d\": [109, 558, 514, 705], \"label\": \"Text\"}, {\"bbox_2d\": [109, 742, 513, 856], \"label\": \"Text\"}, {\"bbox_2d\": [109, 859, 513, 973], \"label\": \"Text\"}, {\"bbox_2d\": [541, 926, 944, 973], \"label\": \"Text\"}, {\"bbox_2d\": [541, 842, 945, 923], \"label\": \"Text\"}, {\"bbox_2d\": [541, 708, 944, 839], \"label\": \"Text\"}, {\"bbox_2d\": [541, 641, 946, 705], \"label\": \"Text\"}, {\"bbox_2d\": [541, 340, 945, 371], \"label\": \"Text\"}, {\"bbox_2d\": [541, 206, 946, 337], \"label\": \"Text\"}, {\"bbox_2d\": [541, 106, 945, 203], \"label\": \"Text\"}, {\"bbox_2d\": [541, 73, 941, 103], \"label\": \"Text\"}, {\"bbox_2d\": [541, 591, 954, 623], \"label\": \"Caption\"}, {\"bbox_2d\": [540, 389, 964, 580], \"label\": \"Picture\"}, {\"bbox_2d\": [109, 722, 241, 739], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [773, 14, 997, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 98, 291, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 118, 375, 164], \"label\": \"Text\"}, {\"bbox_2d\": [121, 166, 544, 178], \"label\": \"Text\"}, {\"bbox_2d\": [121, 214, 557, 228], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 461, 275, 475], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 477, 232, 491], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 746, 350, 759], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 778, 965, 856], \"label\": \"Table\"}, {\"bbox_2d\": [121, 779, 205, 788], \"label\": \"Text\"}, {\"bbox_2d\": [121, 666, 594, 676], \"label\": \"Text\"}, {\"bbox_2d\": [121, 677, 363, 687], \"label\": \"Text\"}, {\"bbox_2d\": [121, 688, 415, 698], \"label\": \"Text\"}, {\"bbox_2d\": [121, 699, 952, 720], \"label\": \"Text\"}, {\"bbox_2d\": [121, 377, 956, 387], \"label\": \"Text\"}, {\"bbox_2d\": [121, 388, 576, 398], \"label\": \"Text\"}, {\"bbox_2d\": [121, 400, 415, 409], \"label\": \"Text\"}, {\"bbox_2d\": [119, 230, 963, 374], \"label\": \"Table\"}, {\"bbox_2d\": [118, 494, 964, 664], \"label\": \"Table\"}, {\"bbox_2d\": [862, 998, 990, 1015], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [774, 20, 905, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [953, 18, 997, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [120, 119, 374, 164], \"label\": \"Text\"}, {\"bbox_2d\": [120, 170, 544, 178], \"label\": \"Text\"}, {\"bbox_2d\": [122, 102, 290, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 217, 558, 227], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 463, 276, 473], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 480, 233, 489], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 749, 350, 758], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 669, 594, 677], \"label\": \"Text\"}, {\"bbox_2d\": [120, 679, 365, 687], \"label\": \"Text\"}, {\"bbox_2d\": [120, 690, 415, 698], \"label\": \"Text\"}, {\"bbox_2d\": [120, 700, 956, 719], \"label\": \"Text\"}, {\"bbox_2d\": [863, 1001, 992, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [119, 767, 965, 857], \"label\": \"Table\"}, {\"bbox_2d\": [118, 495, 965, 663], \"label\": \"Table\"}, {\"bbox_2d\": [119, 231, 967, 371], \"label\": \"Table\"}, {\"bbox_2d\": [120, 378, 957, 386], \"label\": \"Text\"}, {\"bbox_2d\": [159, 388, 576, 397], \"label\": \"Text\"}, {\"bbox_2d\": [159, 399, 416, 407], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 83, 698, 332], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 353, 468, 373], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 384, 918, 427], \"label\": \"Text\"}, {\"bbox_2d\": [293, 447, 896, 475], \"label\": \"Text\"}, {\"bbox_2d\": [293, 495, 908, 523], \"label\": \"Text\"}, {\"bbox_2d\": [546, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [293, 83, 626, 98], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 102, 667, 118], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 122, 482, 137], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 141, 678, 157], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 161, 564, 176], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 181, 615, 196], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 200, 639, 216], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 220, 494, 235], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 239, 503, 255], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 259, 428, 274], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 278, 486, 294], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 298, 696, 313], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 318, 696, 333], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 384, 918, 428], \"label\": \"Text\"}, {\"bbox_2d\": [293, 446, 896, 475], \"label\": \"Text\"}, {\"bbox_2d\": [293, 493, 908, 522], \"label\": \"Text\"}, {\"bbox_2d\": [121, 355, 468, 372], \"label\": \"Section-header\"}, {\"bbox_2d\": [546, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 84, 426, 100], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 101, 917, 144], \"label\": \"Text\"}, {\"bbox_2d\": [293, 173, 798, 185], \"label\": \"Caption\"}, {\"bbox_2d\": [311, 206, 776, 488], \"label\": \"Text\"}, {\"bbox_2d\": [293, 536, 907, 564], \"label\": \"Text\"}, {\"bbox_2d\": [293, 583, 918, 627], \"label\": \"Text\"}, {\"bbox_2d\": [293, 646, 871, 674], \"label\": \"Text\"}, {\"bbox_2d\": [293, 693, 920, 737], \"label\": \"Text\"}, {\"bbox_2d\": [293, 520, 470, 535], \"label\": \"Section-header\"}, {\"bbox_2d\": [418, 979, 933, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 430, 446, 488], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 425, 99], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 98, 912, 145], \"label\": \"Text\"}, {\"bbox_2d\": [293, 173, 800, 184], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 520, 470, 535], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 536, 907, 565], \"label\": \"Text\"}, {\"bbox_2d\": [293, 583, 918, 628], \"label\": \"Text\"}, {\"bbox_2d\": [293, 646, 872, 675], \"label\": \"Text\"}, {\"bbox_2d\": [293, 693, 920, 737], \"label\": \"Text\"}, {\"bbox_2d\": [310, 205, 776, 487], \"label\": \"Text\"}, {\"bbox_2d\": [418, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [291, 84, 887, 113], \"label\": \"Text\"}, {\"bbox_2d\": [272, 119, 881, 148], \"label\": \"List-item\"}, {\"bbox_2d\": [272, 154, 896, 215], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 233, 541, 255], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 311, 539, 330], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 791, 308, 811], \"label\": \"Section-header\"}, {\"bbox_2d\": [199, 924, 573, 944], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 979, 375, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [272, 949, 881, 963], \"label\": \"Text\"}, {\"bbox_2d\": [272, 818, 915, 861], \"label\": \"Text\"}, {\"bbox_2d\": [272, 728, 913, 775], \"label\": \"Text\"}, {\"bbox_2d\": [272, 461, 849, 475], \"label\": \"Text\"}, {\"bbox_2d\": [272, 336, 915, 443], \"label\": \"Text\"}, {\"bbox_2d\": [272, 265, 867, 294], \"label\": \"Text\"}, {\"bbox_2d\": [272, 501, 642, 685], \"label\": \"Picture\"}, {\"bbox_2d\": [272, 702, 556, 715], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [291, 85, 883, 114], \"label\": \"Text\"}, {\"bbox_2d\": [273, 118, 874, 149], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 153, 890, 216], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 238, 541, 255], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 313, 534, 330], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 266, 861, 295], \"label\": \"Text\"}, {\"bbox_2d\": [273, 337, 909, 444], \"label\": \"Text\"}, {\"bbox_2d\": [273, 462, 844, 475], \"label\": \"Text\"}, {\"bbox_2d\": [272, 704, 553, 715], \"label\": \"Caption\"}, {\"bbox_2d\": [269, 499, 641, 690], \"label\": \"Picture\"}, {\"bbox_2d\": [273, 731, 908, 776], \"label\": \"Text\"}, {\"bbox_2d\": [273, 817, 911, 862], \"label\": \"Text\"}, {\"bbox_2d\": [273, 880, 855, 909], \"label\": \"Text\"}, {\"bbox_2d\": [273, 950, 876, 964], \"label\": \"Text\"}, {\"bbox_2d\": [197, 794, 302, 811], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 927, 567, 944], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 978, 370, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [152, 107, 905, 205], \"label\": \"Text\"}, {\"bbox_2d\": [133, 207, 893, 285], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 369, 724, 385], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 449, 901, 587], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 590, 876, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 712, 901, 910], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 913, 877, 931], \"label\": \"List-item\"}, {\"bbox_2d\": [189, 672, 869, 690], \"label\": \"Section-header\"}, {\"bbox_2d\": [189, 409, 858, 427], \"label\": \"Section-header\"}, {\"bbox_2d\": [189, 309, 862, 346], \"label\": \"Section-header\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [152, 106, 905, 203], \"label\": \"Text\"}, {\"bbox_2d\": [133, 913, 877, 929], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 712, 904, 909], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 591, 876, 647], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 449, 901, 586], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 368, 729, 385], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 207, 893, 284], \"label\": \"List-item\"}, {\"bbox_2d\": [190, 308, 858, 344], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 409, 860, 425], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 671, 871, 687], \"label\": \"Section-header\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [18, 401, 150, 641], \"label\": \"Page-header\"}, {\"bbox_2d\": [62, 44, 958, 988], \"label\": \"Table\"}, {\"bbox_2d\": [508, 997, 527, 1007], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [508, 992, 527, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [22, 402, 41, 633], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 57, 931, 989], \"label\": \"Table\"}, {\"bbox_2d\": [60, 41, 383, 64], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 84, 705, 201], \"label\": \"Text\"}, {\"bbox_2d\": [293, 814, 931, 844], \"label\": \"Text\"}, {\"bbox_2d\": [293, 861, 903, 922], \"label\": \"Text\"}, {\"bbox_2d\": [525, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 784, 385, 804], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 754, 495, 766], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 257, 807, 743], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [293, 83, 597, 96], \"label\": \"Text\"}, {\"bbox_2d\": [301, 109, 651, 135], \"label\": \"Text\"}, {\"bbox_2d\": [293, 148, 704, 174], \"label\": \"Text\"}, {\"bbox_2d\": [301, 187, 476, 201], \"label\": \"Text\"}, {\"bbox_2d\": [293, 815, 931, 844], \"label\": \"Text\"}, {\"bbox_2d\": [293, 862, 903, 922], \"label\": \"Text\"}, {\"bbox_2d\": [526, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 786, 385, 803], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 755, 497, 766], \"label\": \"Caption\"}, {\"bbox_2d\": [294, 256, 805, 740], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [61, 47, 1035, 635], \"label\": \"Picture\"}, {\"bbox_2d\": [455, 651, 663, 881], \"label\": \"Caption\"}, {\"bbox_2d\": [731, 651, 931, 881], \"label\": \"Caption\"}, {\"bbox_2d\": [62, 994, 79, 1006], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [452, 651, 657, 881], \"label\": \"Text\"}, {\"bbox_2d\": [731, 651, 931, 881], \"label\": \"Text\"}, {\"bbox_2d\": [62, 998, 77, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [59, 46, 1035, 637], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [155, 129, 879, 174], \"label\": \"Text\"}, {\"bbox_2d\": [155, 210, 880, 441], \"label\": \"Text\"}, {\"bbox_2d\": [155, 480, 880, 650], \"label\": \"Text\"}, {\"bbox_2d\": [155, 755, 880, 759], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 788, 485, 800], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 703, 308, 722], \"label\": \"Section-header\"}, {\"bbox_2d\": [507, 878, 528, 891], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [155, 129, 880, 174], \"label\": \"Text\"}, {\"bbox_2d\": [155, 208, 880, 441], \"label\": \"Text\"}, {\"bbox_2d\": [155, 480, 880, 650], \"label\": \"Text\"}, {\"bbox_2d\": [155, 703, 308, 723], \"label\": \"Section-header\"}, {\"bbox_2d\": [155, 754, 880, 799], \"label\": \"List-item\"}, {\"bbox_2d\": [508, 877, 527, 891], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [302, 97, 733, 110], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 891, 914, 926], \"label\": \"Text\"}, {\"bbox_2d\": [171, 112, 866, 873], \"label\": \"Table\"}, {\"bbox_2d\": [509, 970, 526, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 888, 914, 925], \"label\": \"Text\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [172, 113, 871, 876], \"label\": \"Table\"}, {\"bbox_2d\": [301, 98, 734, 109], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [430, 27, 613, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [111, 101, 927, 185], \"label\": \"Text\"}, {\"bbox_2d\": [111, 211, 921, 273], \"label\": \"Text\"}, {\"bbox_2d\": [111, 324, 476, 346], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 372, 927, 434], \"label\": \"Text\"}, {\"bbox_2d\": [111, 486, 413, 504], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 535, 392, 553], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 585, 449, 603], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 635, 592, 653], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [110, 98, 927, 186], \"label\": \"Text\"}, {\"bbox_2d\": [110, 209, 928, 275], \"label\": \"Text\"}, {\"bbox_2d\": [110, 369, 928, 435], \"label\": \"Text\"}, {\"bbox_2d\": [110, 483, 414, 507], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 533, 392, 556], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 582, 449, 605], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 631, 592, 654], \"label\": \"List-item\"}, {\"bbox_2d\": [110, 315, 478, 349], \"label\": \"Section-header\"}, {\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [138, 93, 578, 111], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 286, 645, 304], \"label\": \"List-item\"}, {\"bbox_2d\": [119, 408, 911, 587], \"label\": \"Text\"}, {\"bbox_2d\": [119, 677, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [119, 631, 653, 645], \"label\": \"Section-header\"}, {\"bbox_2d\": [172, 222, 911, 265], \"label\": \"Text\"}, {\"bbox_2d\": [365, 121, 911, 214], \"label\": \"Formula\"}, {\"bbox_2d\": [312, 317, 911, 397], \"label\": \"Formula\"}, {\"bbox_2d\": [474, 397, 911, 415], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [138, 95, 578, 122], \"label\": \"Text\"}, {\"bbox_2d\": [364, 121, 911, 185], \"label\": \"Formula\"}, {\"bbox_2d\": [457, 178, 911, 215], \"label\": \"Formula\"}, {\"bbox_2d\": [422, 302, 911, 353], \"label\": \"Formula\"}, {\"bbox_2d\": [471, 362, 911, 399], \"label\": \"Formula\"}, {\"bbox_2d\": [172, 221, 911, 264], \"label\": \"Text\"}, {\"bbox_2d\": [135, 287, 646, 314], \"label\": \"Text\"}, {\"bbox_2d\": [121, 408, 911, 588], \"label\": \"Text\"}, {\"bbox_2d\": [121, 676, 911, 965], \"label\": \"Text\"}, {\"bbox_2d\": [121, 631, 652, 644], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [106, 39, 480, 55], \"label\": \"Page-header\"}, {\"bbox_2d\": [106, 417, 943, 619], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 649, 501, 721], \"label\": \"Text\"}, {\"bbox_2d\": [106, 731, 501, 925], \"label\": \"Text\"}, {\"bbox_2d\": [527, 649, 926, 772], \"label\": \"Text\"}, {\"bbox_2d\": [527, 787, 926, 840], \"label\": \"Text\"}, {\"bbox_2d\": [106, 999, 281, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [107, 37, 477, 57], \"label\": \"Section-header\"}, {\"bbox_2d\": [0, 84, 1033, 404], \"label\": \"Picture\"}, {\"bbox_2d\": [103, 409, 964, 624], \"label\": \"Section-header\"}, {\"bbox_2d\": [107, 650, 504, 718], \"label\": \"Text\"}, {\"bbox_2d\": [528, 650, 925, 770], \"label\": \"Text\"}, {\"bbox_2d\": [528, 786, 931, 835], \"label\": \"Text\"}, {\"bbox_2d\": [107, 733, 496, 924], \"label\": \"Text\"}, {\"bbox_2d\": [106, 1000, 281, 1009], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [294, 85, 645, 98], \"label\": \"Caption\"}, {\"bbox_2d\": [294, 760, 885, 804], \"label\": \"Text\"}, {\"bbox_2d\": [294, 95, 934, 419], \"label\": \"Table\"}, {\"bbox_2d\": [318, 442, 686, 714], \"label\": \"Text\"}, {\"bbox_2d\": [703, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [293, 86, 643, 97], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 97, 933, 420], \"label\": \"Table\"}, {\"bbox_2d\": [479, 442, 675, 454], \"label\": \"Section-header\"}, {\"bbox_2d\": [317, 465, 844, 712], \"label\": \"Text\"}, {\"bbox_2d\": [293, 761, 880, 805], \"label\": \"Text\"}, {\"bbox_2d\": [703, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [654, 35, 898, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 35, 319, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 65, 898, 98], \"label\": \"Text\"}, {\"bbox_2d\": [168, 117, 582, 134], \"label\": \"List-item\"}, {\"bbox_2d\": [168, 134, 880, 150], \"label\": \"List-item\"}, {\"bbox_2d\": [168, 150, 875, 183], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 202, 898, 235], \"label\": \"Text\"}, {\"bbox_2d\": [138, 254, 898, 355], \"label\": \"Text\"}, {\"bbox_2d\": [138, 374, 898, 439], \"label\": \"Text\"}, {\"bbox_2d\": [138, 493, 898, 593], \"label\": \"Text\"}, {\"bbox_2d\": [138, 612, 898, 645], \"label\": \"Text\"}, {\"bbox_2d\": [138, 664, 898, 696], \"label\": \"Text\"}, {\"bbox_2d\": [138, 714, 898, 762], \"label\": \"Text\"}, {\"bbox_2d\": [138, 820, 898, 852], \"label\": \"Text\"}, {\"bbox_2d\": [138, 904, 898, 952], \"label\": \"Text\"}, {\"bbox_2d\": [533, 973, 549, 985], \"label\": \"Page-footer\"}, {\"bbox_2d\": [138, 869, 386, 886], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 782, 313, 799], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 458, 424, 475], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [138, 35, 322, 47], \"label\": \"Text\"}, {\"bbox_2d\": [654, 35, 903, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 65, 902, 98], \"label\": \"Text\"}, {\"bbox_2d\": [169, 116, 583, 132], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 133, 881, 149], \"label\": \"List-item\"}, {\"bbox_2d\": [169, 150, 875, 183], \"label\": \"List-item\"}, {\"bbox_2d\": [138, 201, 902, 235], \"label\": \"Text\"}, {\"bbox_2d\": [138, 252, 903, 354], \"label\": \"Text\"}, {\"bbox_2d\": [138, 371, 903, 439], \"label\": \"Text\"}, {\"bbox_2d\": [138, 457, 430, 473], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 491, 903, 592], \"label\": \"Text\"}, {\"bbox_2d\": [138, 610, 903, 643], \"label\": \"Text\"}, {\"bbox_2d\": [138, 661, 903, 694], \"label\": \"Text\"}, {\"bbox_2d\": [138, 712, 903, 763], \"label\": \"Text\"}, {\"bbox_2d\": [138, 781, 319, 800], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 869, 392, 885], \"label\": \"Section-header\"}, {\"bbox_2d\": [138, 817, 902, 851], \"label\": \"Text\"}, {\"bbox_2d\": [138, 903, 903, 953], \"label\": \"Text\"}, {\"bbox_2d\": [534, 972, 554, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [273, 84, 908, 143], \"label\": \"Text\"}, {\"bbox_2d\": [273, 164, 912, 254], \"label\": \"Text\"}, {\"bbox_2d\": [273, 306, 914, 334], \"label\": \"Text\"}, {\"bbox_2d\": [273, 391, 907, 420], \"label\": \"Text\"}, {\"bbox_2d\": [273, 472, 909, 499], \"label\": \"Text\"}, {\"bbox_2d\": [273, 573, 901, 601], \"label\": \"Text\"}, {\"bbox_2d\": [273, 653, 880, 681], \"label\": \"Text\"}, {\"bbox_2d\": [273, 754, 645, 766], \"label\": \"Text\"}, {\"bbox_2d\": [273, 787, 888, 798], \"label\": \"Text\"}, {\"bbox_2d\": [273, 689, 909, 716], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 723, 389, 735], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 541, 389, 553], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 506, 907, 534], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 363, 389, 374], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 340, 891, 352], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 277, 440, 293], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 442, 601, 458], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 623, 440, 640], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 902, 145], \"label\": \"Text\"}, {\"bbox_2d\": [273, 163, 907, 255], \"label\": \"Text\"}, {\"bbox_2d\": [101, 277, 440, 294], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 443, 601, 459], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 624, 440, 641], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 754, 645, 768], \"label\": \"Text\"}, {\"bbox_2d\": [273, 786, 885, 799], \"label\": \"Text\"}, {\"bbox_2d\": [273, 653, 873, 682], \"label\": \"Text\"}, {\"bbox_2d\": [273, 573, 894, 602], \"label\": \"Text\"}, {\"bbox_2d\": [273, 471, 903, 500], \"label\": \"Text\"}, {\"bbox_2d\": [273, 392, 902, 421], \"label\": \"Text\"}, {\"bbox_2d\": [273, 305, 909, 334], \"label\": \"Text\"}, {\"bbox_2d\": [273, 338, 888, 354], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 360, 388, 374], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 686, 905, 717], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 723, 388, 736], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 504, 901, 535], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 542, 388, 555], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [228, 92, 926, 884], \"label\": \"Table\"}, {\"bbox_2d\": [231, 902, 780, 917], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 922, 926, 963], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 692, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [233, 91, 924, 883], \"label\": \"Table\"}, {\"bbox_2d\": [231, 900, 780, 915], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 920, 926, 963], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 694, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [135, 100, 474, 141], \"label\": \"Text\"}, {\"bbox_2d\": [546, 100, 907, 187], \"label\": \"Text\"}, {\"bbox_2d\": [546, 221, 907, 367], \"label\": \"Text\"}, {\"bbox_2d\": [546, 387, 907, 893], \"label\": \"Text\"}, {\"bbox_2d\": [546, 900, 897, 968], \"label\": \"Text\"}, {\"bbox_2d\": [135, 146, 485, 276], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 283, 476, 547], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 551, 254, 563], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 567, 485, 652], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 657, 338, 669], \"label\": \"List-item\"}, {\"bbox_2d\": [135, 674, 475, 770], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 775, 241, 787], \"label\": \"List-item\"}, {\"bbox_2d\": [135, 792, 477, 893], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 897, 449, 922], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 927, 468, 968], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 191, 869, 221], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 371, 676, 383], \"label\": \"List-item\"}, {\"bbox_2d\": [519, 221, 897, 249], \"label\": \"List-item\"}, {\"bbox_2d\": [60, 999, 170, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [35, 15, 84, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 18, 353, 29], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 20, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [132, 101, 468, 142], \"label\": \"Text\"}, {\"bbox_2d\": [546, 101, 912, 187], \"label\": \"Text\"}, {\"bbox_2d\": [546, 222, 916, 367], \"label\": \"Text\"}, {\"bbox_2d\": [132, 161, 485, 277], \"label\": \"Text\"}, {\"bbox_2d\": [546, 387, 913, 637], \"label\": \"Text\"}, {\"bbox_2d\": [546, 642, 919, 893], \"label\": \"Text\"}, {\"bbox_2d\": [546, 897, 897, 968], \"label\": \"Text\"}, {\"bbox_2d\": [113, 927, 472, 968], \"label\": \"Text\"}, {\"bbox_2d\": [132, 792, 482, 893], \"label\": \"Text\"}, {\"bbox_2d\": [132, 672, 481, 772], \"label\": \"Text\"}, {\"bbox_2d\": [132, 582, 485, 652], \"label\": \"Text\"}, {\"bbox_2d\": [132, 297, 483, 547], \"label\": \"Text\"}, {\"bbox_2d\": [113, 146, 441, 157], \"label\": \"List-item\"}, {\"bbox_2d\": [520, 192, 874, 217], \"label\": \"List-item\"}, {\"bbox_2d\": [113, 282, 410, 292], \"label\": \"List-item\"}, {\"bbox_2d\": [520, 372, 676, 382], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 552, 254, 562], \"label\": \"List-item\"}, {\"bbox_2d\": [113, 567, 294, 577], \"label\": \"List-item\"}, {\"bbox_2d\": [113, 657, 336, 667], \"label\": \"List-item\"}, {\"bbox_2d\": [113, 777, 353, 787], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 897, 453, 923], \"label\": \"List-item\"}, {\"bbox_2d\": [58, 1001, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [74, 47, 278, 75], \"label\": \"Page-header\"}, {\"bbox_2d\": [74, 90, 488, 179], \"label\": \"Text\"}, {\"bbox_2d\": [74, 185, 488, 330], \"label\": \"Text\"}, {\"bbox_2d\": [74, 335, 488, 649], \"label\": \"Text\"}, {\"bbox_2d\": [526, 410, 940, 651], \"label\": \"Text\"}, {\"bbox_2d\": [526, 88, 940, 124], \"label\": \"Text\"}, {\"bbox_2d\": [526, 129, 940, 405], \"label\": \"Text\"}, {\"bbox_2d\": [489, 977, 516, 989], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [75, 43, 293, 75], \"label\": \"Page-header\"}, {\"bbox_2d\": [528, 85, 947, 122], \"label\": \"Text\"}, {\"bbox_2d\": [73, 87, 498, 179], \"label\": \"Text\"}, {\"bbox_2d\": [526, 122, 952, 403], \"label\": \"Text\"}, {\"bbox_2d\": [73, 180, 497, 328], \"label\": \"Text\"}, {\"bbox_2d\": [71, 329, 498, 646], \"label\": \"Text\"}, {\"bbox_2d\": [525, 404, 953, 648], \"label\": \"Text\"}, {\"bbox_2d\": [489, 972, 522, 986], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [790, 73, 893, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 94, 523, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 143, 500, 204], \"label\": \"Text\"}, {\"bbox_2d\": [130, 210, 500, 231], \"label\": \"Text\"}, {\"bbox_2d\": [130, 237, 500, 273], \"label\": \"Text\"}, {\"bbox_2d\": [130, 279, 500, 315], \"label\": \"Text\"}, {\"bbox_2d\": [130, 322, 500, 343], \"label\": \"Text\"}, {\"bbox_2d\": [130, 349, 500, 410], \"label\": \"Text\"}, {\"bbox_2d\": [526, 143, 895, 251], \"label\": \"Text\"}, {\"bbox_2d\": [526, 257, 894, 343], \"label\": \"Text\"}, {\"bbox_2d\": [526, 349, 895, 395], \"label\": \"Text\"}, {\"bbox_2d\": [526, 396, 895, 408], \"label\": \"Text\"}, {\"bbox_2d\": [157, 459, 339, 509], \"label\": \"Picture\"}, {\"bbox_2d\": [157, 553, 343, 599], \"label\": \"Picture\"}, {\"bbox_2d\": [530, 455, 709, 506], \"label\": \"Picture\"}, {\"bbox_2d\": [530, 547, 806, 658], \"label\": \"Picture\"}, {\"bbox_2d\": [530, 674, 768, 742], \"label\": \"Picture\"}, {\"bbox_2d\": [530, 827, 833, 947], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 678, 447, 790], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 832, 421, 947], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 92, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [790, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [131, 142, 499, 204], \"label\": \"Text\"}, {\"bbox_2d\": [131, 208, 499, 231], \"label\": \"Text\"}, {\"bbox_2d\": [131, 236, 499, 273], \"label\": \"Text\"}, {\"bbox_2d\": [131, 278, 499, 314], \"label\": \"Text\"}, {\"bbox_2d\": [131, 319, 500, 355], \"label\": \"Text\"}, {\"bbox_2d\": [131, 360, 499, 409], \"label\": \"Text\"}, {\"bbox_2d\": [526, 142, 894, 252], \"label\": \"Text\"}, {\"bbox_2d\": [526, 255, 895, 331], \"label\": \"Text\"}, {\"bbox_2d\": [526, 334, 895, 370], \"label\": \"Text\"}, {\"bbox_2d\": [526, 373, 895, 409], \"label\": \"Text\"}, {\"bbox_2d\": [158, 450, 523, 508], \"label\": \"Picture\"}, {\"bbox_2d\": [531, 450, 899, 509], \"label\": \"Picture\"}, {\"bbox_2d\": [157, 542, 525, 599], \"label\": \"Picture\"}, {\"bbox_2d\": [532, 542, 898, 662], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 672, 524, 792], \"label\": \"Picture\"}, {\"bbox_2d\": [530, 672, 898, 736], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 825, 525, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [532, 825, 898, 946], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}, {\"bbox_2d\": [84, 77, 990, 971], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 77, 990, 970], \"label\": \"Table\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 310, 106], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 114, 911, 190], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 202, 926, 245], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 334, 901, 380], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 390, 926, 433], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 444, 901, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 519, 915, 578], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 589, 926, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 254, 310, 266], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 314, 310, 326], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 497, 310, 510], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 668, 363, 684], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 689, 926, 749], \"label\": \"Text\"}, {\"bbox_2d\": [231, 765, 926, 809], \"label\": \"Text\"}, {\"bbox_2d\": [241, 833, 908, 876], \"label\": \"Text\"}, {\"bbox_2d\": [231, 922, 778, 935], \"label\": \"Text\"}, {\"bbox_2d\": [231, 901, 373, 917], \"label\": \"Section-header\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 309, 105], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 254, 309, 266], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 200, 926, 244], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 115, 915, 190], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 314, 309, 326], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 336, 902, 380], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 390, 926, 433], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 444, 903, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 497, 309, 509], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 519, 914, 579], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 589, 926, 648], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 670, 362, 684], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 690, 926, 749], \"label\": \"Text\"}, {\"bbox_2d\": [231, 765, 926, 809], \"label\": \"Text\"}, {\"bbox_2d\": [241, 833, 911, 877], \"label\": \"Text\"}, {\"bbox_2d\": [231, 922, 778, 934], \"label\": \"Text\"}, {\"bbox_2d\": [231, 903, 373, 917], \"label\": \"Section-header\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [255, 276, 872, 304], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [18, 92, 359, 263], \"label\": \"Table\"}, {\"bbox_2d\": [18, 344, 471, 422], \"label\": \"Text\"}, {\"bbox_2d\": [18, 37, 110, 81], \"label\": \"Section-header\"}, {\"bbox_2d\": [547, 50, 910, 81], \"label\": \"Section-header\"}, {\"bbox_2d\": [18, 290, 281, 327], \"label\": \"Section-header\"}, {\"bbox_2d\": [18, 614, 222, 648], \"label\": \"Section-header\"}, {\"bbox_2d\": [545, 531, 825, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [545, 701, 764, 734], \"label\": \"Section-header\"}, {\"bbox_2d\": [18, 427, 463, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 494, 456, 532], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 534, 461, 576], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 580, 300, 597], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 703, 415, 720], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 723, 358, 740], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 744, 456, 862], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 866, 429, 883], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 887, 455, 945], \"label\": \"List-item\"}, {\"bbox_2d\": [545, 587, 992, 686], \"label\": \"Text\"}, {\"bbox_2d\": [545, 757, 983, 918], \"label\": \"Text\"}, {\"bbox_2d\": [18, 663, 465, 701], \"label\": \"Text\"}, {\"bbox_2d\": [525, 94, 998, 500], \"label\": \"Picture\"}, {\"bbox_2d\": [968, 962, 994, 982], \"label\": \"Page-footer\"}, {\"bbox_2d\": [298, 952, 470, 988], \"label\": \"Picture\"}, {\"bbox_2d\": [542, 951, 712, 988], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [21, 43, 109, 80], \"label\": \"Section-header\"}, {\"bbox_2d\": [17, 287, 278, 323], \"label\": \"Section-header\"}, {\"bbox_2d\": [21, 609, 221, 646], \"label\": \"Section-header\"}, {\"bbox_2d\": [548, 698, 764, 735], \"label\": \"Section-header\"}, {\"bbox_2d\": [546, 527, 824, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [549, 43, 910, 80], \"label\": \"Section-header\"}, {\"bbox_2d\": [16, 89, 392, 265], \"label\": \"Table\"}, {\"bbox_2d\": [16, 340, 473, 422], \"label\": \"Text\"}, {\"bbox_2d\": [16, 423, 462, 488], \"label\": \"List-item\"}, {\"bbox_2d\": [16, 489, 456, 532], \"label\": \"List-item\"}, {\"bbox_2d\": [16, 533, 461, 576], \"label\": \"List-item\"}, {\"bbox_2d\": [16, 577, 300, 598], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 700, 414, 721], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 721, 357, 741], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 741, 456, 822], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 863, 426, 883], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 883, 453, 944], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 755, 989, 918], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 584, 996, 686], \"label\": \"List-item\"}, {\"bbox_2d\": [543, 95, 1001, 497], \"label\": \"Picture\"}, {\"bbox_2d\": [974, 958, 984, 978], \"label\": \"Page-footer\"}, {\"bbox_2d\": [30, 958, 39, 979], \"label\": \"Page-footer\"}, {\"bbox_2d\": [18, 660, 471, 701], \"label\": \"Text\"}, {\"bbox_2d\": [35, 822, 154, 863], \"label\": \"Text\"}, {\"bbox_2d\": [299, 949, 470, 990], \"label\": \"Picture\"}, {\"bbox_2d\": [542, 949, 714, 989], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [92, 94, 252, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 109, 506, 148], \"label\": \"Text\"}, {\"bbox_2d\": [92, 161, 506, 185], \"label\": \"Text\"}, {\"bbox_2d\": [92, 286, 510, 308], \"label\": \"Text\"}, {\"bbox_2d\": [92, 318, 257, 341], \"label\": \"Text\"}, {\"bbox_2d\": [92, 353, 264, 376], \"label\": \"Text\"}, {\"bbox_2d\": [92, 378, 510, 407], \"label\": \"Text\"}, {\"bbox_2d\": [92, 412, 508, 436], \"label\": \"Text\"}, {\"bbox_2d\": [92, 440, 510, 485], \"label\": \"Text\"}, {\"bbox_2d\": [92, 494, 508, 529], \"label\": \"Text\"}, {\"bbox_2d\": [92, 529, 258, 565], \"label\": \"Text\"}, {\"bbox_2d\": [92, 577, 510, 629], \"label\": \"Text\"}, {\"bbox_2d\": [92, 645, 506, 683], \"label\": \"Text\"}, {\"bbox_2d\": [92, 825, 506, 850], \"label\": \"Text\"}, {\"bbox_2d\": [528, 110, 943, 162], \"label\": \"Text\"}, {\"bbox_2d\": [528, 175, 943, 228], \"label\": \"Text\"}, {\"bbox_2d\": [528, 297, 943, 336], \"label\": \"Text\"}, {\"bbox_2d\": [528, 349, 813, 359], \"label\": \"Text\"}, {\"bbox_2d\": [528, 511, 919, 520], \"label\": \"Text\"}, {\"bbox_2d\": [584, 999, 943, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [92, 689, 510, 809], \"label\": \"Table\"}, {\"bbox_2d\": [92, 197, 510, 272], \"label\": \"Table\"}, {\"bbox_2d\": [528, 366, 947, 502], \"label\": \"Table\"}, {\"bbox_2d\": [528, 94, 636, 104], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 233, 943, 286], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 261, 943, 286], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 228, 943, 252], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 67, 997, 314], \"label\": \"Picture\"}, {\"bbox_2d\": [92, 536, 510, 629], \"label\": \"Table\"}, {\"bbox_2d\": [92, 41, 997, 619], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [1002, 94, 1013, 343], \"label\": \"Page-header\"}, {\"bbox_2d\": [584, 1000, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [89, 690, 512, 811], \"label\": \"Table\"}, {\"bbox_2d\": [89, 201, 512, 631], \"label\": \"Table\"}, {\"bbox_2d\": [526, 366, 946, 503], \"label\": \"Table\"}, {\"bbox_2d\": [528, 511, 919, 519], \"label\": \"Text\"}, {\"bbox_2d\": [528, 349, 925, 359], \"label\": \"Text\"}, {\"bbox_2d\": [91, 644, 507, 683], \"label\": \"Text\"}, {\"bbox_2d\": [91, 825, 507, 849], \"label\": \"Text\"}, {\"bbox_2d\": [528, 109, 944, 163], \"label\": \"Text\"}, {\"bbox_2d\": [528, 175, 944, 199], \"label\": \"Text\"}, {\"bbox_2d\": [91, 109, 507, 148], \"label\": \"Text\"}, {\"bbox_2d\": [91, 160, 507, 185], \"label\": \"Text\"}, {\"bbox_2d\": [528, 95, 636, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 95, 253, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 204, 944, 228], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 232, 944, 257], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 261, 944, 286], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 298, 944, 337], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [791, 62, 973, 95], \"label\": \"Page-header\"}, {\"bbox_2d\": [113, 106, 528, 181], \"label\": \"Text\"}, {\"bbox_2d\": [113, 183, 532, 519], \"label\": \"Text\"}, {\"bbox_2d\": [132, 521, 525, 651], \"label\": \"Text\"}, {\"bbox_2d\": [132, 654, 525, 744], \"label\": \"Text\"}, {\"bbox_2d\": [133, 747, 525, 984], \"label\": \"Text\"}, {\"bbox_2d\": [565, 106, 978, 181], \"label\": \"Text\"}, {\"bbox_2d\": [586, 183, 980, 237], \"label\": \"Text\"}, {\"bbox_2d\": [586, 239, 978, 292], \"label\": \"Text\"}, {\"bbox_2d\": [586, 294, 978, 365], \"label\": \"Text\"}, {\"bbox_2d\": [565, 369, 980, 481], \"label\": \"Text\"}, {\"bbox_2d\": [565, 484, 980, 764], \"label\": \"Text\"}, {\"bbox_2d\": [565, 766, 980, 984], \"label\": \"Text\"}, {\"bbox_2d\": [534, 993, 559, 1007], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [791, 62, 992, 94], \"label\": \"Page-header\"}, {\"bbox_2d\": [534, 988, 557, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [112, 105, 531, 179], \"label\": \"Text\"}, {\"bbox_2d\": [566, 106, 992, 180], \"label\": \"Text\"}, {\"bbox_2d\": [586, 180, 984, 236], \"label\": \"Text\"}, {\"bbox_2d\": [585, 236, 982, 291], \"label\": \"Text\"}, {\"bbox_2d\": [586, 292, 982, 366], \"label\": \"Text\"}, {\"bbox_2d\": [114, 180, 537, 516], \"label\": \"Text\"}, {\"bbox_2d\": [566, 367, 992, 479], \"label\": \"Text\"}, {\"bbox_2d\": [132, 517, 532, 647], \"label\": \"Text\"}, {\"bbox_2d\": [132, 648, 535, 740], \"label\": \"Text\"}, {\"bbox_2d\": [564, 479, 997, 759], \"label\": \"Text\"}, {\"bbox_2d\": [132, 741, 536, 979], \"label\": \"Text\"}, {\"bbox_2d\": [564, 760, 986, 978], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [29, 259, 61, 707], \"label\": \"Page-header\"}, {\"bbox_2d\": [137, 93, 895, 145], \"label\": \"Title\"}, {\"bbox_2d\": [234, 176, 797, 188], \"label\": \"Text\"}, {\"bbox_2d\": [289, 206, 746, 245], \"label\": \"Text\"}, {\"bbox_2d\": [402, 261, 630, 273], \"label\": \"Text\"}, {\"bbox_2d\": [179, 296, 853, 449], \"label\": \"Text\"}, {\"bbox_2d\": [121, 540, 911, 636], \"label\": \"Text\"}, {\"bbox_2d\": [121, 650, 911, 800], \"label\": \"Text\"}, {\"bbox_2d\": [121, 814, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [412, 495, 621, 507], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [31, 259, 61, 706], \"label\": \"Page-header\"}, {\"bbox_2d\": [136, 94, 895, 146], \"label\": \"Title\"}, {\"bbox_2d\": [234, 174, 798, 189], \"label\": \"Text\"}, {\"bbox_2d\": [287, 203, 745, 246], \"label\": \"Text\"}, {\"bbox_2d\": [401, 260, 630, 274], \"label\": \"Text\"}, {\"bbox_2d\": [179, 294, 854, 450], \"label\": \"Text\"}, {\"bbox_2d\": [121, 538, 911, 636], \"label\": \"Text\"}, {\"bbox_2d\": [121, 648, 911, 811], \"label\": \"Text\"}, {\"bbox_2d\": [121, 812, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [411, 493, 621, 508], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [143, 102, 901, 888], \"label\": \"Table\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [143, 104, 901, 888], \"label\": \"Table\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [189, 65, 853, 210], \"label\": \"Table\"}, {\"bbox_2d\": [91, 272, 506, 379], \"label\": \"Text\"}, {\"bbox_2d\": [536, 272, 794, 284], \"label\": \"Text\"}, {\"bbox_2d\": [536, 372, 951, 485], \"label\": \"Text\"}, {\"bbox_2d\": [91, 380, 506, 575], \"label\": \"Text\"}, {\"bbox_2d\": [91, 577, 506, 612], \"label\": \"Text\"}, {\"bbox_2d\": [536, 487, 951, 590], \"label\": \"Text\"}, {\"bbox_2d\": [536, 623, 951, 659], \"label\": \"Text\"}, {\"bbox_2d\": [536, 713, 856, 741], \"label\": \"Text\"}, {\"bbox_2d\": [536, 744, 721, 757], \"label\": \"Text\"}, {\"bbox_2d\": [536, 843, 951, 879], \"label\": \"Text\"}, {\"bbox_2d\": [536, 882, 951, 956], \"label\": \"Text\"}, {\"bbox_2d\": [536, 665, 951, 703], \"label\": \"Formula\"}, {\"bbox_2d\": [568, 601, 951, 637], \"label\": \"Formula\"}, {\"bbox_2d\": [122, 618, 506, 659], \"label\": \"Formula\"}, {\"bbox_2d\": [113, 779, 506, 827], \"label\": \"Formula\"}, {\"bbox_2d\": [590, 770, 951, 825], \"label\": \"Formula\"}, {\"bbox_2d\": [618, 829, 951, 872], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 874, 951, 896], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 665, 506, 746], \"label\": \"Text\"}, {\"bbox_2d\": [91, 748, 506, 827], \"label\": \"Text\"}, {\"bbox_2d\": [91, 829, 506, 843], \"label\": \"Text\"}, {\"bbox_2d\": [91, 932, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [513, 274, 846, 303], \"label\": \"Formula\"}, {\"bbox_2d\": [613, 321, 951, 380], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 217, 910, 230], \"label\": \"Caption\"}, {\"bbox_2d\": [513, 401, 951, 447], \"label\": \"Text\"}, {\"bbox_2d\": [536, 538, 951, 601], \"label\": \"Text\"}, {\"bbox_2d\": [536, 948, 951, 971], \"label\": \"Text\"}, {\"bbox_2d\": [513, 974, 522, 986], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [190, 65, 857, 203], \"label\": \"Table\"}, {\"bbox_2d\": [97, 217, 945, 229], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 268, 506, 376], \"label\": \"Text\"}, {\"bbox_2d\": [536, 271, 880, 285], \"label\": \"Text\"}, {\"bbox_2d\": [91, 379, 506, 581], \"label\": \"Text\"}, {\"bbox_2d\": [91, 576, 506, 613], \"label\": \"Text\"}, {\"bbox_2d\": [91, 663, 506, 778], \"label\": \"Text\"}, {\"bbox_2d\": [91, 829, 196, 842], \"label\": \"Text\"}, {\"bbox_2d\": [91, 930, 506, 952], \"label\": \"Text\"}, {\"bbox_2d\": [536, 887, 951, 939], \"label\": \"Text\"}, {\"bbox_2d\": [553, 934, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 868, 895, 892], \"label\": \"Text\"}, {\"bbox_2d\": [553, 744, 681, 757], \"label\": \"Text\"}, {\"bbox_2d\": [536, 712, 951, 750], \"label\": \"Text\"}, {\"bbox_2d\": [536, 624, 951, 661], \"label\": \"Text\"}, {\"bbox_2d\": [536, 393, 951, 494], \"label\": \"Text\"}, {\"bbox_2d\": [536, 488, 951, 571], \"label\": \"Text\"}, {\"bbox_2d\": [609, 301, 951, 382], \"label\": \"Formula\"}, {\"bbox_2d\": [126, 621, 506, 652], \"label\": \"Formula\"}, {\"bbox_2d\": [571, 579, 951, 610], \"label\": \"Formula\"}, {\"bbox_2d\": [560, 668, 951, 699], \"label\": \"Formula\"}, {\"bbox_2d\": [116, 785, 506, 816], \"label\": \"Formula\"}, {\"bbox_2d\": [169, 856, 506, 919], \"label\": \"Formula\"}, {\"bbox_2d\": [618, 773, 951, 855], \"label\": \"Formula\"}, {\"bbox_2d\": [517, 973, 525, 986], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [123, 89, 453, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 119, 914, 146], \"label\": \"Text\"}, {\"bbox_2d\": [292, 154, 910, 181], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 189, 534, 201], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 209, 408, 221], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 240, 660, 251], \"label\": \"Text\"}, {\"bbox_2d\": [292, 583, 914, 610], \"label\": \"Text\"}, {\"bbox_2d\": [292, 630, 895, 657], \"label\": \"Text\"}, {\"bbox_2d\": [292, 677, 910, 689], \"label\": \"Text\"}, {\"bbox_2d\": [292, 699, 640, 711], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 712, 931, 945], \"label\": \"Table\"}, {\"bbox_2d\": [292, 259, 931, 555], \"label\": \"Picture\"}, {\"bbox_2d\": [632, 979, 933, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 89, 453, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 118, 914, 146], \"label\": \"Text\"}, {\"bbox_2d\": [293, 239, 660, 252], \"label\": \"Text\"}, {\"bbox_2d\": [293, 582, 914, 611], \"label\": \"Text\"}, {\"bbox_2d\": [293, 629, 894, 658], \"label\": \"Text\"}, {\"bbox_2d\": [293, 676, 910, 689], \"label\": \"Text\"}, {\"bbox_2d\": [293, 153, 910, 182], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 186, 533, 201], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 208, 408, 221], \"label\": \"List-item\"}, {\"bbox_2d\": [493, 263, 668, 275], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 701, 641, 712], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 712, 931, 948], \"label\": \"Table\"}, {\"bbox_2d\": [633, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [316, 286, 798, 417], \"label\": \"Table\"}, {\"bbox_2d\": [317, 533, 540, 545], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 926, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [241, 137, 913, 164], \"label\": \"Text\"}, {\"bbox_2d\": [231, 219, 884, 247], \"label\": \"Text\"}, {\"bbox_2d\": [231, 484, 911, 512], \"label\": \"Text\"}, {\"bbox_2d\": [109, 442, 381, 461], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 199, 484, 213], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 394, 510, 405], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 940, 435, 951], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 522, 926, 598], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 607, 926, 665], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 676, 926, 735], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 746, 926, 774], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 791, 822, 802], \"label\": \"Text\"}, {\"bbox_2d\": [231, 825, 591, 934], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 266, 784, 390], \"label\": \"Picture\"}, {\"bbox_2d\": [636, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 926, 121], \"label\": \"List-item\"}, {\"bbox_2d\": [241, 136, 912, 164], \"label\": \"Text\"}, {\"bbox_2d\": [231, 219, 885, 246], \"label\": \"Text\"}, {\"bbox_2d\": [231, 199, 484, 213], \"label\": \"Text\"}, {\"bbox_2d\": [231, 394, 509, 405], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 266, 783, 390], \"label\": \"Picture\"}, {\"bbox_2d\": [109, 442, 381, 461], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 484, 909, 512], \"label\": \"Text\"}, {\"bbox_2d\": [231, 522, 926, 597], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 607, 926, 666], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 676, 926, 735], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 746, 926, 773], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 790, 824, 802], \"label\": \"Text\"}, {\"bbox_2d\": [231, 820, 592, 936], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 939, 435, 950], \"label\": \"Caption\"}, {\"bbox_2d\": [635, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [138, 93, 894, 145], \"label\": \"Title\"}, {\"bbox_2d\": [350, 174, 682, 187], \"label\": \"Text\"}, {\"bbox_2d\": [145, 204, 835, 217], \"label\": \"Text\"}, {\"bbox_2d\": [145, 232, 889, 245], \"label\": \"Text\"}, {\"bbox_2d\": [178, 271, 853, 766], \"label\": \"Text\"}, {\"bbox_2d\": [178, 795, 589, 808], \"label\": \"Text\"}, {\"bbox_2d\": [178, 824, 491, 836], \"label\": \"Text\"}, {\"bbox_2d\": [34, 263, 62, 703], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [138, 93, 895, 146], \"label\": \"Title\"}, {\"bbox_2d\": [349, 172, 682, 189], \"label\": \"Text\"}, {\"bbox_2d\": [193, 204, 835, 217], \"label\": \"Text\"}, {\"bbox_2d\": [146, 232, 886, 245], \"label\": \"Text\"}, {\"bbox_2d\": [179, 270, 853, 766], \"label\": \"Text\"}, {\"bbox_2d\": [179, 795, 588, 808], \"label\": \"Text\"}, {\"bbox_2d\": [179, 824, 492, 837], \"label\": \"Text\"}, {\"bbox_2d\": [31, 264, 61, 701], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [155, 126, 880, 236], \"label\": \"Text\"}, {\"bbox_2d\": [155, 253, 880, 518], \"label\": \"Text\"}, {\"bbox_2d\": [155, 552, 880, 629], \"label\": \"Text\"}, {\"bbox_2d\": [155, 693, 880, 741], \"label\": \"Text\"}, {\"bbox_2d\": [155, 774, 880, 849], \"label\": \"Text\"}, {\"bbox_2d\": [155, 662, 506, 675], \"label\": \"Text\"}, {\"bbox_2d\": [508, 878, 527, 891], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [155, 125, 880, 236], \"label\": \"Text\"}, {\"bbox_2d\": [155, 253, 880, 518], \"label\": \"Text\"}, {\"bbox_2d\": [155, 551, 880, 629], \"label\": \"Text\"}, {\"bbox_2d\": [155, 663, 575, 677], \"label\": \"Text\"}, {\"bbox_2d\": [155, 694, 880, 739], \"label\": \"Text\"}, {\"bbox_2d\": [155, 773, 880, 850], \"label\": \"Text\"}, {\"bbox_2d\": [508, 877, 527, 891], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 85, 428, 97], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 103, 1009, 226], \"label\": \"Text\"}, {\"bbox_2d\": [25, 231, 1009, 279], \"label\": \"Text\"}, {\"bbox_2d\": [25, 284, 1009, 317], \"label\": \"Text\"}, {\"bbox_2d\": [24, 328, 1011, 592], \"label\": \"Table\"}, {\"bbox_2d\": [504, 661, 529, 671], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 85, 428, 96], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 102, 1009, 226], \"label\": \"Text\"}, {\"bbox_2d\": [25, 233, 1009, 277], \"label\": \"Text\"}, {\"bbox_2d\": [25, 284, 1009, 317], \"label\": \"Text\"}, {\"bbox_2d\": [504, 661, 529, 672], \"label\": \"Page-footer\"}, {\"bbox_2d\": [24, 330, 1009, 592], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 138, 629, 153], \"label\": \"Page-header\"}, {\"bbox_2d\": [131, 232, 489, 247], \"label\": \"Section-header\"}, {\"bbox_2d\": [131, 275, 944, 365], \"label\": \"Text\"}, {\"bbox_2d\": [131, 390, 944, 462], \"label\": \"Text\"}, {\"bbox_2d\": [131, 487, 935, 634], \"label\": \"Text\"}, {\"bbox_2d\": [131, 659, 931, 730], \"label\": \"Text\"}, {\"bbox_2d\": [131, 755, 908, 827], \"label\": \"Text\"}, {\"bbox_2d\": [131, 851, 931, 884], \"label\": \"Text\"}, {\"bbox_2d\": [131, 924, 205, 953], \"label\": \"Caption\"}, {\"bbox_2d\": [131, 893, 319, 913], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [91, 135, 630, 156], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 231, 490, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [130, 274, 945, 367], \"label\": \"Text\"}, {\"bbox_2d\": [130, 389, 945, 463], \"label\": \"Text\"}, {\"bbox_2d\": [130, 485, 935, 635], \"label\": \"Text\"}, {\"bbox_2d\": [130, 658, 932, 731], \"label\": \"Text\"}, {\"bbox_2d\": [130, 754, 911, 827], \"label\": \"Text\"}, {\"bbox_2d\": [130, 850, 932, 885], \"label\": \"Text\"}, {\"bbox_2d\": [130, 922, 205, 951], \"label\": \"Caption\"}, {\"bbox_2d\": [131, 891, 320, 915], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [526, 37, 546, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [119, 75, 727, 93], \"label\": \"Text\"}, {\"bbox_2d\": [119, 95, 958, 151], \"label\": \"Text\"}, {\"bbox_2d\": [119, 158, 958, 274], \"label\": \"Text\"}, {\"bbox_2d\": [119, 276, 958, 320], \"label\": \"Text\"}, {\"bbox_2d\": [119, 317, 958, 348], \"label\": \"Text\"}, {\"bbox_2d\": [119, 354, 958, 391], \"label\": \"Text\"}, {\"bbox_2d\": [119, 395, 958, 426], \"label\": \"Text\"}, {\"bbox_2d\": [119, 431, 958, 548], \"label\": \"Text\"}, {\"bbox_2d\": [119, 552, 958, 586], \"label\": \"Text\"}, {\"bbox_2d\": [165, 588, 958, 619], \"label\": \"Text\"}, {\"bbox_2d\": [113, 621, 188, 643], \"label\": \"Text\"}, {\"bbox_2d\": [165, 647, 958, 852], \"label\": \"Text\"}, {\"bbox_2d\": [119, 855, 958, 943], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [528, 40, 544, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [119, 76, 718, 91], \"label\": \"Text\"}, {\"bbox_2d\": [119, 96, 957, 152], \"label\": \"Text\"}, {\"bbox_2d\": [118, 156, 958, 270], \"label\": \"Text\"}, {\"bbox_2d\": [119, 274, 958, 310], \"label\": \"Text\"}, {\"bbox_2d\": [119, 313, 958, 348], \"label\": \"Text\"}, {\"bbox_2d\": [120, 352, 958, 388], \"label\": \"Text\"}, {\"bbox_2d\": [119, 391, 958, 428], \"label\": \"Text\"}, {\"bbox_2d\": [119, 431, 958, 541], \"label\": \"Text\"}, {\"bbox_2d\": [119, 549, 958, 582], \"label\": \"Text\"}, {\"bbox_2d\": [165, 588, 770, 605], \"label\": \"Text\"}, {\"bbox_2d\": [119, 609, 957, 642], \"label\": \"Text\"}, {\"bbox_2d\": [165, 651, 530, 663], \"label\": \"Text\"}, {\"bbox_2d\": [165, 668, 651, 684], \"label\": \"Text\"}, {\"bbox_2d\": [165, 686, 428, 703], \"label\": \"Text\"}, {\"bbox_2d\": [165, 707, 274, 721], \"label\": \"Text\"}, {\"bbox_2d\": [165, 729, 333, 742], \"label\": \"Text\"}, {\"bbox_2d\": [165, 745, 685, 762], \"label\": \"Text\"}, {\"bbox_2d\": [165, 769, 409, 781], \"label\": \"Text\"}, {\"bbox_2d\": [165, 788, 530, 801], \"label\": \"Text\"}, {\"bbox_2d\": [119, 806, 958, 841], \"label\": \"Text\"}, {\"bbox_2d\": [119, 844, 958, 881], \"label\": \"Text\"}, {\"bbox_2d\": [119, 882, 958, 942], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [115, 125, 531, 366], \"label\": \"Text\"}, {\"bbox_2d\": [114, 369, 528, 736], \"label\": \"Text\"}, {\"bbox_2d\": [114, 740, 527, 866], \"label\": \"Text\"}, {\"bbox_2d\": [566, 125, 983, 346], \"label\": \"Text\"}, {\"bbox_2d\": [566, 349, 984, 626], \"label\": \"Text\"}, {\"bbox_2d\": [566, 629, 984, 993], \"label\": \"Text\"}, {\"bbox_2d\": [155, 869, 296, 885], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 886, 350, 902], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 903, 326, 919], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 920, 426, 937], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 939, 527, 990], \"label\": \"List-item\"}, {\"bbox_2d\": [533, 1004, 559, 1017], \"label\": \"Page-footer\"}, {\"bbox_2d\": [688, 83, 973, 110], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [688, 80, 988, 111], \"label\": \"Page-header\"}, {\"bbox_2d\": [567, 122, 991, 344], \"label\": \"Text\"}, {\"bbox_2d\": [116, 122, 539, 363], \"label\": \"Text\"}, {\"bbox_2d\": [568, 345, 992, 624], \"label\": \"Text\"}, {\"bbox_2d\": [116, 365, 538, 735], \"label\": \"Text\"}, {\"bbox_2d\": [567, 625, 990, 989], \"label\": \"Text\"}, {\"bbox_2d\": [116, 737, 537, 863], \"label\": \"Text\"}, {\"bbox_2d\": [156, 863, 304, 881], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 881, 354, 899], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 900, 339, 917], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 917, 440, 935], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 935, 537, 988], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [89, 65, 788, 82], \"label\": \"Page-header\"}, {\"bbox_2d\": [89, 124, 957, 435], \"label\": \"Table\"}, {\"bbox_2d\": [92, 440, 462, 456], \"label\": \"Section-header\"}, {\"bbox_2d\": [92, 488, 949, 553], \"label\": \"Text\"}, {\"bbox_2d\": [89, 560, 957, 910], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [88, 60, 790, 85], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 123, 952, 909], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [429, 28, 613, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [438, 997, 602, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [95, 76, 516, 198], \"label\": \"Picture\"}, {\"bbox_2d\": [569, 76, 916, 198], \"label\": \"Picture\"}, {\"bbox_2d\": [95, 236, 516, 377], \"label\": \"Picture\"}, {\"bbox_2d\": [572, 236, 916, 377], \"label\": \"Picture\"}, {\"bbox_2d\": [96, 423, 516, 592], \"label\": \"Picture\"}, {\"bbox_2d\": [573, 423, 916, 592], \"label\": \"Picture\"}, {\"bbox_2d\": [95, 637, 516, 758], \"label\": \"Picture\"}, {\"bbox_2d\": [573, 637, 916, 758], \"label\": \"Picture\"}, {\"bbox_2d\": [95, 806, 516, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [572, 806, 916, 946], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 440, 234, 461], \"label\": \"Section-header\"}, {\"bbox_2d\": [600, 440, 720, 461], \"label\": \"Section-header\"}, {\"bbox_2d\": [604, 254, 720, 274], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 254, 246, 274], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 94, 217, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 140, 385, 180], \"label\": \"Text\"}, {\"bbox_2d\": [121, 300, 498, 363], \"label\": \"Text\"}, {\"bbox_2d\": [121, 488, 481, 571], \"label\": \"Text\"}, {\"bbox_2d\": [121, 699, 457, 738], \"label\": \"Text\"}, {\"bbox_2d\": [604, 488, 880, 548], \"label\": \"Text\"}, {\"bbox_2d\": [604, 300, 850, 363], \"label\": \"Text\"}, {\"bbox_2d\": [604, 140, 886, 180], \"label\": \"Text\"}, {\"bbox_2d\": [121, 824, 222, 844], \"label\": \"Section-header\"}, {\"bbox_2d\": [600, 656, 693, 675], \"label\": \"Section-header\"}, {\"bbox_2d\": [600, 824, 748, 844], \"label\": \"Section-header\"}, {\"bbox_2d\": [600, 701, 855, 741], \"label\": \"Text\"}, {\"bbox_2d\": [121, 870, 476, 909], \"label\": \"Text\"}, {\"bbox_2d\": [598, 870, 824, 931], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [594, 92, 713, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 92, 218, 114], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 252, 244, 274], \"label\": \"Section-header\"}, {\"bbox_2d\": [602, 252, 722, 274], \"label\": \"Section-header\"}, {\"bbox_2d\": [124, 439, 236, 461], \"label\": \"Section-header\"}, {\"bbox_2d\": [603, 439, 721, 461], \"label\": \"Section-header\"}, {\"bbox_2d\": [603, 653, 694, 675], \"label\": \"Section-header\"}, {\"bbox_2d\": [124, 651, 243, 673], \"label\": \"Section-header\"}, {\"bbox_2d\": [118, 821, 223, 843], \"label\": \"Section-header\"}, {\"bbox_2d\": [597, 821, 749, 843], \"label\": \"Section-header\"}, {\"bbox_2d\": [597, 866, 837, 932], \"label\": \"Text\"}, {\"bbox_2d\": [438, 996, 602, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [118, 866, 484, 909], \"label\": \"Text\"}, {\"bbox_2d\": [124, 696, 460, 740], \"label\": \"Text\"}, {\"bbox_2d\": [603, 698, 864, 741], \"label\": \"Text\"}, {\"bbox_2d\": [603, 483, 889, 549], \"label\": \"Text\"}, {\"bbox_2d\": [124, 483, 489, 572], \"label\": \"Text\"}, {\"bbox_2d\": [602, 297, 855, 363], \"label\": \"Text\"}, {\"bbox_2d\": [122, 297, 505, 363], \"label\": \"Text\"}, {\"bbox_2d\": [122, 136, 416, 180], \"label\": \"Text\"}, {\"bbox_2d\": [594, 136, 887, 180], \"label\": \"Text\"}, {\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [440, 74, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [308, 94, 324, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [187, 111, 469, 172], \"label\": \"Title\"}, {\"bbox_2d\": [187, 189, 437, 198], \"label\": \"Section-header\"}, {\"bbox_2d\": [146, 217, 292, 226], \"label\": \"Text\"}, {\"bbox_2d\": [146, 232, 506, 339], \"label\": \"Text\"}, {\"bbox_2d\": [146, 359, 506, 466], \"label\": \"Text\"}, {\"bbox_2d\": [146, 497, 506, 781], \"label\": \"Text\"}, {\"bbox_2d\": [146, 787, 506, 883], \"label\": \"Text\"}, {\"bbox_2d\": [146, 888, 506, 982], \"label\": \"Text\"}, {\"bbox_2d\": [525, 113, 896, 146], \"label\": \"Text\"}, {\"bbox_2d\": [525, 147, 896, 229], \"label\": \"Text\"}, {\"bbox_2d\": [525, 236, 896, 405], \"label\": \"Text\"}, {\"bbox_2d\": [525, 411, 896, 470], \"label\": \"Text\"}, {\"bbox_2d\": [525, 476, 896, 522], \"label\": \"Text\"}, {\"bbox_2d\": [525, 561, 896, 640], \"label\": \"Text\"}, {\"bbox_2d\": [525, 646, 896, 692], \"label\": \"Text\"}, {\"bbox_2d\": [525, 698, 896, 721], \"label\": \"Text\"}, {\"bbox_2d\": [525, 727, 822, 738], \"label\": \"Text\"}, {\"bbox_2d\": [525, 754, 896, 776], \"label\": \"Text\"}, {\"bbox_2d\": [525, 535, 819, 544], \"label\": \"Section-header\"}, {\"bbox_2d\": [525, 844, 828, 855], \"label\": \"Text\"}, {\"bbox_2d\": [525, 861, 828, 871], \"label\": \"Text\"}, {\"bbox_2d\": [544, 772, 763, 949], \"label\": \"Picture\"}, {\"bbox_2d\": [543, 959, 896, 982], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [703, 93, 717, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [440, 74, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [309, 93, 323, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [526, 111, 894, 145], \"label\": \"Text\"}, {\"bbox_2d\": [524, 148, 895, 232], \"label\": \"Text\"}, {\"bbox_2d\": [130, 885, 500, 982], \"label\": \"Text\"}, {\"bbox_2d\": [130, 784, 500, 881], \"label\": \"Text\"}, {\"bbox_2d\": [130, 497, 503, 780], \"label\": \"Text\"}, {\"bbox_2d\": [130, 359, 503, 493], \"label\": \"Text\"}, {\"bbox_2d\": [130, 230, 500, 340], \"label\": \"Text\"}, {\"bbox_2d\": [523, 236, 895, 407], \"label\": \"Text\"}, {\"bbox_2d\": [524, 410, 898, 470], \"label\": \"Text\"}, {\"bbox_2d\": [526, 473, 895, 520], \"label\": \"Text\"}, {\"bbox_2d\": [526, 561, 898, 646], \"label\": \"Text\"}, {\"bbox_2d\": [526, 649, 896, 696], \"label\": \"Text\"}, {\"bbox_2d\": [526, 699, 894, 721], \"label\": \"Text\"}, {\"bbox_2d\": [541, 725, 822, 735], \"label\": \"Text\"}, {\"bbox_2d\": [541, 738, 892, 748], \"label\": \"Text\"}, {\"bbox_2d\": [598, 535, 823, 545], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 216, 296, 226], \"label\": \"Section-header\"}, {\"bbox_2d\": [146, 344, 349, 354], \"label\": \"Section-header\"}, {\"bbox_2d\": [188, 189, 442, 199], \"label\": \"Section-header\"}, {\"bbox_2d\": [158, 110, 471, 173], \"label\": \"Title\"}, {\"bbox_2d\": [552, 763, 895, 949], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 959, 890, 982], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [727, 16, 916, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [953, 14, 998, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [119, 97, 962, 406], \"label\": \"Table\"}, {\"bbox_2d\": [119, 404, 925, 426], \"label\": \"Text\"}, {\"bbox_2d\": [119, 424, 955, 457], \"label\": \"Text\"}, {\"bbox_2d\": [119, 455, 951, 478], \"label\": \"Text\"}, {\"bbox_2d\": [119, 822, 948, 845], \"label\": \"Text\"}, {\"bbox_2d\": [119, 843, 948, 876], \"label\": \"Text\"}, {\"bbox_2d\": [119, 884, 961, 907], \"label\": \"Text\"}, {\"bbox_2d\": [862, 998, 991, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [119, 515, 412, 529], \"label\": \"Section-header\"}, {\"bbox_2d\": [119, 525, 962, 821], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [727, 20, 905, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [953, 18, 998, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [118, 99, 964, 405], \"label\": \"Table\"}, {\"bbox_2d\": [120, 407, 926, 426], \"label\": \"Text\"}, {\"bbox_2d\": [159, 428, 964, 457], \"label\": \"Text\"}, {\"bbox_2d\": [159, 459, 955, 478], \"label\": \"Text\"}, {\"bbox_2d\": [118, 518, 964, 824], \"label\": \"Table\"}, {\"bbox_2d\": [120, 827, 953, 845], \"label\": \"Text\"}, {\"bbox_2d\": [159, 848, 959, 887], \"label\": \"Text\"}, {\"bbox_2d\": [159, 890, 961, 908], \"label\": \"Text\"}, {\"bbox_2d\": [949, 1001, 992, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [863, 1003, 937, 1013], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [767, 59, 970, 86], \"label\": \"Section-header\"}, {\"bbox_2d\": [716, 99, 823, 139], \"label\": \"Formula\"}, {\"bbox_2d\": [711, 154, 822, 188], \"label\": \"Formula\"}, {\"bbox_2d\": [711, 203, 788, 237], \"label\": \"Formula\"}, {\"bbox_2d\": [561, 240, 614, 254], \"label\": \"Text\"}, {\"bbox_2d\": [596, 259, 972, 330], \"label\": \"Text\"}, {\"bbox_2d\": [596, 335, 973, 406], \"label\": \"Text\"}, {\"bbox_2d\": [559, 411, 602, 425], \"label\": \"Text\"}, {\"bbox_2d\": [596, 429, 771, 444], \"label\": \"Text\"}, {\"bbox_2d\": [599, 448, 797, 463], \"label\": \"Text\"}, {\"bbox_2d\": [597, 467, 819, 482], \"label\": \"Text\"}, {\"bbox_2d\": [596, 486, 834, 501], \"label\": \"Text\"}, {\"bbox_2d\": [559, 505, 972, 537], \"label\": \"Text\"}, {\"bbox_2d\": [559, 542, 972, 862], \"label\": \"Text\"}, {\"bbox_2d\": [559, 867, 972, 956], \"label\": \"Text\"}, {\"bbox_2d\": [108, 103, 523, 249], \"label\": \"Text\"}, {\"bbox_2d\": [108, 254, 520, 495], \"label\": \"Text\"}, {\"bbox_2d\": [108, 500, 523, 715], \"label\": \"Text\"}, {\"bbox_2d\": [108, 720, 521, 957], \"label\": \"Text\"}, {\"bbox_2d\": [526, 988, 551, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 100, 536, 249], \"label\": \"Text\"}, {\"bbox_2d\": [108, 250, 528, 494], \"label\": \"Text\"}, {\"bbox_2d\": [107, 495, 531, 711], \"label\": \"Text\"}, {\"bbox_2d\": [106, 714, 532, 956], \"label\": \"Text\"}, {\"bbox_2d\": [560, 858, 989, 951], \"label\": \"Text\"}, {\"bbox_2d\": [559, 595, 987, 858], \"label\": \"Text\"}, {\"bbox_2d\": [560, 498, 976, 535], \"label\": \"Text\"}, {\"bbox_2d\": [600, 539, 878, 556], \"label\": \"List-item\"}, {\"bbox_2d\": [600, 558, 954, 575], \"label\": \"List-item\"}, {\"bbox_2d\": [600, 576, 932, 594], \"label\": \"List-item\"}, {\"bbox_2d\": [561, 404, 605, 422], \"label\": \"Text\"}, {\"bbox_2d\": [560, 235, 622, 253], \"label\": \"Text\"}, {\"bbox_2d\": [599, 254, 984, 328], \"label\": \"List-item\"}, {\"bbox_2d\": [599, 329, 988, 403], \"label\": \"List-item\"}, {\"bbox_2d\": [601, 423, 776, 441], \"label\": \"Text\"}, {\"bbox_2d\": [601, 442, 810, 459], \"label\": \"Text\"}, {\"bbox_2d\": [600, 460, 823, 478], \"label\": \"Text\"}, {\"bbox_2d\": [595, 479, 835, 497], \"label\": \"Text\"}, {\"bbox_2d\": [526, 983, 559, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [708, 95, 843, 235], \"label\": \"Formula\"}, {\"bbox_2d\": [767, 55, 991, 86], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 93, 492, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 660, 361, 674], \"label\": \"Section-header\"}, {\"bbox_2d\": [141, 762, 911, 789], \"label\": \"Text\"}, {\"bbox_2d\": [231, 694, 926, 737], \"label\": \"Text\"}, {\"bbox_2d\": [231, 815, 818, 827], \"label\": \"Text\"}, {\"bbox_2d\": [231, 127, 926, 186], \"label\": \"Text\"}, {\"bbox_2d\": [231, 204, 479, 214], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 225, 921, 625], \"label\": \"Text\"}, {\"bbox_2d\": [231, 791, 926, 801], \"label\": \"Text\"}, {\"bbox_2d\": [231, 839, 653, 849], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 855, 638, 865], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 871, 838, 881], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 886, 395, 896], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 902, 407, 912], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 918, 396, 928], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 933, 885, 943], \"label\": \"List-item\"}, {\"bbox_2d\": [739, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 127, 926, 186], \"label\": \"Text\"}, {\"bbox_2d\": [231, 693, 926, 737], \"label\": \"Text\"}, {\"bbox_2d\": [241, 761, 916, 789], \"label\": \"Text\"}, {\"bbox_2d\": [231, 815, 819, 827], \"label\": \"Text\"}, {\"bbox_2d\": [231, 837, 653, 849], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 853, 637, 865], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 869, 838, 881], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 884, 395, 897], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 900, 407, 912], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 916, 399, 928], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 932, 885, 944], \"label\": \"List-item\"}, {\"bbox_2d\": [738, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 203, 476, 214], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 93, 492, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 659, 361, 675], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 225, 923, 624], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [419, 469, 632, 483], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [419, 466, 631, 482], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [368, 79, 674, 91], \"label\": \"Page-header\"}, {\"bbox_2d\": [228, 138, 808, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 210, 507, 222], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 336, 632, 348], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 231, 926, 321], \"label\": \"Text\"}, {\"bbox_2d\": [115, 356, 926, 426], \"label\": \"Text\"}, {\"bbox_2d\": [115, 433, 926, 524], \"label\": \"Text\"}, {\"bbox_2d\": [115, 531, 926, 582], \"label\": \"Text\"}, {\"bbox_2d\": [115, 662, 236, 713], \"label\": \"Caption\"}, {\"bbox_2d\": [115, 791, 850, 801], \"label\": \"Text\"}, {\"bbox_2d\": [513, 999, 525, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [108, 636, 362, 664], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [367, 79, 668, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [228, 135, 807, 157], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 209, 507, 221], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 335, 636, 348], \"label\": \"Section-header\"}, {\"bbox_2d\": [512, 999, 523, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [115, 229, 923, 318], \"label\": \"Text\"}, {\"bbox_2d\": [115, 356, 923, 426], \"label\": \"Text\"}, {\"bbox_2d\": [115, 434, 923, 523], \"label\": \"Text\"}, {\"bbox_2d\": [115, 531, 923, 581], \"label\": \"Text\"}, {\"bbox_2d\": [115, 663, 238, 713], \"label\": \"Text\"}, {\"bbox_2d\": [102, 630, 364, 662], \"label\": \"Picture\"}, {\"bbox_2d\": [115, 791, 849, 801], \"label\": \"Footnote\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [90, 64, 292, 95], \"label\": \"Page-header\"}, {\"bbox_2d\": [370, 923, 702, 940], \"label\": \"Caption\"}, {\"bbox_2d\": [521, 988, 548, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [267, 113, 827, 899], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [90, 66, 306, 96], \"label\": \"Page-header\"}, {\"bbox_2d\": [251, 114, 831, 909], \"label\": \"Picture\"}, {\"bbox_2d\": [369, 926, 691, 941], \"label\": \"Caption\"}, {\"bbox_2d\": [520, 988, 553, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [340, 42, 959, 74], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 120, 911, 163], \"label\": \"Text\"}, {\"bbox_2d\": [115, 181, 911, 223], \"label\": \"Text\"}, {\"bbox_2d\": [115, 241, 926, 313], \"label\": \"Text\"}, {\"bbox_2d\": [115, 331, 877, 374], \"label\": \"Text\"}, {\"bbox_2d\": [115, 392, 926, 434], \"label\": \"Text\"}, {\"bbox_2d\": [115, 452, 915, 494], \"label\": \"Text\"}, {\"bbox_2d\": [115, 546, 926, 604], \"label\": \"Text\"}, {\"bbox_2d\": [115, 651, 926, 755], \"label\": \"Text\"}, {\"bbox_2d\": [115, 773, 916, 815], \"label\": \"Text\"}, {\"bbox_2d\": [115, 833, 920, 905], \"label\": \"Text\"}, {\"bbox_2d\": [115, 923, 926, 965], \"label\": \"Text\"}, {\"bbox_2d\": [115, 621, 350, 634], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 520, 408, 533], \"label\": \"Section-header\"}, {\"bbox_2d\": [891, 990, 972, 1001], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [115, 120, 911, 132], \"label\": \"Text\"}, {\"bbox_2d\": [115, 135, 909, 147], \"label\": \"Text\"}, {\"bbox_2d\": [115, 150, 859, 162], \"label\": \"Text\"}, {\"bbox_2d\": [115, 180, 926, 222], \"label\": \"Text\"}, {\"bbox_2d\": [115, 240, 938, 312], \"label\": \"Text\"}, {\"bbox_2d\": [115, 331, 885, 373], \"label\": \"Text\"}, {\"bbox_2d\": [115, 391, 927, 433], \"label\": \"Text\"}, {\"bbox_2d\": [115, 451, 916, 493], \"label\": \"Text\"}, {\"bbox_2d\": [115, 546, 929, 603], \"label\": \"Text\"}, {\"bbox_2d\": [115, 651, 926, 753], \"label\": \"Text\"}, {\"bbox_2d\": [115, 771, 917, 813], \"label\": \"Text\"}, {\"bbox_2d\": [115, 832, 919, 904], \"label\": \"Text\"}, {\"bbox_2d\": [115, 922, 940, 964], \"label\": \"Text\"}, {\"bbox_2d\": [779, 989, 973, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [115, 521, 407, 533], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 621, 350, 633], \"label\": \"Section-header\"}, {\"bbox_2d\": [128, 43, 958, 74], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 25, 351, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 91, 131, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 295, 218, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 452, 262, 466], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 551, 229, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 107, 508, 164], \"label\": \"Text\"}, {\"bbox_2d\": [91, 171, 508, 259], \"label\": \"Text\"}, {\"bbox_2d\": [91, 272, 420, 286], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 310, 508, 439], \"label\": \"Text\"}, {\"bbox_2d\": [91, 466, 508, 538], \"label\": \"Text\"}, {\"bbox_2d\": [91, 566, 508, 609], \"label\": \"Text\"}, {\"bbox_2d\": [91, 998, 445, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 351, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 132, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 275, 421, 286], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 298, 219, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 109, 507, 163], \"label\": \"Text\"}, {\"bbox_2d\": [91, 175, 507, 257], \"label\": \"Text\"}, {\"bbox_2d\": [91, 312, 507, 437], \"label\": \"Text\"}, {\"bbox_2d\": [91, 469, 507, 537], \"label\": \"Text\"}, {\"bbox_2d\": [91, 568, 507, 607], \"label\": \"Text\"}, {\"bbox_2d\": [91, 554, 229, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 455, 262, 465], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 1000, 444, 1008], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [72, 34, 627, 97], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 149, 396, 175], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 322, 226, 336], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 149, 744, 160], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 469, 666, 480], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 401, 250, 411], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 486, 407, 496], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 735, 862, 762], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 504, 264, 514], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 570, 206, 580], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 673, 284, 683], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 779, 299, 789], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 844, 164, 854], \"label\": \"List-item\"}, {\"bbox_2d\": [72, 180, 469, 291], \"label\": \"Text\"}, {\"bbox_2d\": [518, 166, 901, 233], \"label\": \"Text\"}, {\"bbox_2d\": [72, 340, 458, 377], \"label\": \"Text\"}, {\"bbox_2d\": [86, 417, 450, 468], \"label\": \"Text\"}, {\"bbox_2d\": [518, 410, 906, 433], \"label\": \"Text\"}, {\"bbox_2d\": [72, 417, 450, 429], \"label\": \"Text\"}, {\"bbox_2d\": [72, 433, 450, 445], \"label\": \"Text\"}, {\"bbox_2d\": [544, 237, 914, 260], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 266, 909, 318], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 324, 868, 361], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 367, 906, 404], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 519, 460, 565], \"label\": \"Text\"}, {\"bbox_2d\": [101, 580, 457, 670], \"label\": \"Text\"}, {\"bbox_2d\": [101, 686, 457, 777], \"label\": \"Text\"}, {\"bbox_2d\": [101, 792, 457, 837], \"label\": \"Text\"}, {\"bbox_2d\": [101, 855, 444, 931], \"label\": \"Text\"}, {\"bbox_2d\": [518, 767, 906, 804], \"label\": \"Text\"}, {\"bbox_2d\": [518, 809, 914, 861], \"label\": \"Text\"}, {\"bbox_2d\": [60, 999, 170, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [518, 491, 666, 503], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 491, 863, 503], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 876, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 876, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 876, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"label\": \"Section-header\"}, {\"bbox_2d\": [518, 503, 865, 526], \"", "label": "```json[{\"bbox_2d\": [146, 60, 628, 78], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 149, 395, 176], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 149, 745, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 324, 226, 336], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 402, 250, 411], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 469, 666, 480], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 487, 407, 496], \"label\": \"Section-header\"}, {\"bbox_2d\": [517, 735, 867, 762], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 505, 263, 513], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 570, 206, 578], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 675, 284, 682], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 844, 163, 852], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 779, 299, 787], \"label\": \"Section-header\"}, {\"bbox_2d\": [58, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [72, 181, 472, 291], \"label\": \"Text\"}, {\"bbox_2d\": [517, 165, 905, 232], \"label\": \"Text\"}, {\"bbox_2d\": [72, 340, 463, 378], \"label\": \"Text\"}, {\"bbox_2d\": [86, 420, 453, 467], \"label\": \"Text\"}, {\"bbox_2d\": [102, 518, 462, 565], \"label\": \"Text\"}, {\"bbox_2d\": [102, 583, 460, 669], \"label\": \"Text\"}, {\"bbox_2d\": [102, 688, 462, 774], \"label\": \"Text\"}, {\"bbox_2d\": [102, 792, 460, 839], \"label\": \"Text\"}, {\"bbox_2d\": [102, 858, 447, 930], \"label\": \"Text\"}, {\"bbox_2d\": [517, 810, 918, 862], \"label\": \"Text\"}, {\"bbox_2d\": [517, 767, 909, 805], \"label\": \"Text\"}, {\"bbox_2d\": [517, 409, 910, 433], \"label\": \"Text\"}, {\"bbox_2d\": [541, 366, 910, 404], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 323, 897, 361], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 266, 913, 318], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 237, 919, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [517, 488, 913, 708], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 117, 506, 351], \"label\": \"Text\"}, {\"bbox_2d\": [178, 362, 447, 374], \"label\": \"Section-header\"}, {\"bbox_2d\": [209, 395, 414, 407], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 429, 506, 503], \"label\": \"Text\"}, {\"bbox_2d\": [121, 504, 506, 560], \"label\": \"Text\"}, {\"bbox_2d\": [121, 590, 495, 821], \"label\": \"Picture\"}, {\"bbox_2d\": [533, 391, 875, 584], \"label\": \"Picture\"}, {\"bbox_2d\": [533, 593, 914, 622], \"label\": \"Caption\"}, {\"bbox_2d\": [533, 666, 914, 714], \"label\": \"Text\"}, {\"bbox_2d\": [533, 715, 914, 966], \"label\": \"Text\"}, {\"bbox_2d\": [121, 850, 506, 950], \"label\": \"Caption\"}, {\"bbox_2d\": [516, 989, 525, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [533, 278, 914, 358], \"label\": \"Caption\"}, {\"bbox_2d\": [538, 109, 686, 267], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [121, 116, 503, 324], \"label\": \"Text\"}, {\"bbox_2d\": [121, 428, 502, 501], \"label\": \"Text\"}, {\"bbox_2d\": [121, 504, 503, 562], \"label\": \"Text\"}, {\"bbox_2d\": [533, 662, 914, 737], \"label\": \"Text\"}, {\"bbox_2d\": [533, 741, 914, 963], \"label\": \"Text\"}, {\"bbox_2d\": [513, 989, 522, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [533, 279, 914, 359], \"label\": \"Caption\"}, {\"bbox_2d\": [533, 592, 914, 617], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 851, 502, 945], \"label\": \"Caption\"}, {\"bbox_2d\": [178, 362, 446, 372], \"label\": \"Section-header\"}, {\"bbox_2d\": [210, 395, 414, 406], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 593, 501, 824], \"label\": \"Picture\"}, {\"bbox_2d\": [530, 393, 877, 576], \"label\": \"Picture\"}, {\"bbox_2d\": [652, 112, 793, 265], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [220, 37, 814, 48], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 12, 155, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 81, 1009, 159], \"label\": \"Text\"}, {\"bbox_2d\": [25, 185, 1009, 231], \"label\": \"Text\"}, {\"bbox_2d\": [25, 239, 1009, 276], \"label\": \"Text\"}, {\"bbox_2d\": [25, 284, 1009, 336], \"label\": \"Text\"}, {\"bbox_2d\": [25, 363, 1009, 457], \"label\": \"Text\"}, {\"bbox_2d\": [25, 484, 1009, 577], \"label\": \"Text\"}, {\"bbox_2d\": [25, 584, 1009, 629], \"label\": \"Text\"}, {\"bbox_2d\": [44, 469, 362, 479], \"label\": \"Section-header\"}, {\"bbox_2d\": [44, 350, 212, 360], \"label\": \"Section-header\"}, {\"bbox_2d\": [44, 169, 338, 180], \"label\": \"Section-header\"}, {\"bbox_2d\": [504, 652, 529, 663], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 155, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [219, 37, 815, 48], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 168, 338, 179], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 350, 212, 360], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 469, 362, 480], \"label\": \"Section-header\"}, {\"bbox_2d\": [504, 652, 529, 662], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 584, 1009, 628], \"label\": \"Text\"}, {\"bbox_2d\": [25, 484, 1009, 575], \"label\": \"Text\"}, {\"bbox_2d\": [25, 365, 1009, 455], \"label\": \"Text\"}, {\"bbox_2d\": [25, 280, 1009, 336], \"label\": \"Text\"}, {\"bbox_2d\": [25, 238, 1009, 271], \"label\": \"Text\"}, {\"bbox_2d\": [25, 184, 1009, 229], \"label\": \"Text\"}, {\"bbox_2d\": [25, 80, 1009, 159], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [667, 222, 784, 242], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 196, 524, 218], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 295, 508, 513], \"label\": \"Text\"}, {\"bbox_2d\": [147, 520, 504, 705], \"label\": \"Text\"}, {\"bbox_2d\": [147, 712, 505, 737], \"label\": \"Text\"}, {\"bbox_2d\": [147, 761, 504, 840], \"label\": \"Text\"}, {\"bbox_2d\": [548, 726, 897, 839], \"label\": \"Text\"}, {\"bbox_2d\": [548, 606, 908, 720], \"label\": \"Text\"}, {\"bbox_2d\": [548, 486, 905, 600], \"label\": \"Text\"}, {\"bbox_2d\": [548, 296, 905, 478], \"label\": \"Text\"}, {\"bbox_2d\": [148, 961, 472, 969], \"label\": \"Caption\"}, {\"bbox_2d\": [548, 961, 729, 970], \"label\": \"Caption\"}, {\"bbox_2d\": [146, 850, 514, 958], \"label\": \"Picture\"}, {\"bbox_2d\": [546, 850, 913, 958], \"label\": \"Picture\"}, {\"bbox_2d\": [814, 1002, 984, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [148, 120, 428, 158], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [148, 196, 525, 217], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 297, 512, 513], \"label\": \"Text\"}, {\"bbox_2d\": [148, 521, 512, 703], \"label\": \"Text\"}, {\"bbox_2d\": [148, 710, 507, 737], \"label\": \"Text\"}, {\"bbox_2d\": [148, 762, 506, 843], \"label\": \"Text\"}, {\"bbox_2d\": [548, 727, 897, 843], \"label\": \"Text\"}, {\"bbox_2d\": [548, 607, 910, 722], \"label\": \"Text\"}, {\"bbox_2d\": [548, 486, 906, 602], \"label\": \"Text\"}, {\"bbox_2d\": [548, 297, 905, 481], \"label\": \"Text\"}, {\"bbox_2d\": [1014, 581, 1024, 633], \"label\": \"Page-header\"}, {\"bbox_2d\": [548, 963, 730, 969], \"label\": \"Caption\"}, {\"bbox_2d\": [148, 963, 473, 969], \"label\": \"Caption\"}, {\"bbox_2d\": [813, 1002, 985, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [666, 223, 786, 242], \"label\": \"Page-footer\"}, {\"bbox_2d\": [148, 123, 235, 134], \"label\": \"Text\"}, {\"bbox_2d\": [665, 0, 993, 213], \"label\": \"Picture\"}, {\"bbox_2d\": [147, 850, 514, 959], \"label\": \"Picture\"}, {\"bbox_2d\": [548, 850, 913, 959], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [567, 69, 621, 77], \"label\": \"Page-header\"}, {\"bbox_2d\": [39, 37, 1010, 61], \"label\": \"Text\"}, {\"bbox_2d\": [25, 83, 254, 117], \"label\": \"Text\"}, {\"bbox_2d\": [512, 146, 521, 154], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [37, 38, 1012, 59], \"label\": \"Text\"}, {\"bbox_2d\": [567, 67, 621, 74], \"label\": \"Text\"}, {\"bbox_2d\": [23, 84, 254, 91], \"label\": \"Text\"}, {\"bbox_2d\": [23, 100, 105, 115], \"label\": \"Text\"}, {\"bbox_2d\": [511, 146, 522, 153], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [128, 17, 310, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [73, 101, 323, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 117, 204, 128], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 416, 222, 427], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 691, 363, 702], \"label\": \"Section-header\"}, {\"bbox_2d\": [46, 998, 170, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [73, 984, 485, 992], \"label\": \"Text\"}, {\"bbox_2d\": [68, 127, 917, 406], \"label\": \"Table\"}, {\"bbox_2d\": [69, 421, 918, 683], \"label\": \"Table\"}, {\"bbox_2d\": [68, 696, 918, 981], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [37, 18, 308, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [67, 115, 919, 408], \"label\": \"Table\"}, {\"bbox_2d\": [67, 689, 917, 982], \"label\": \"Table\"}, {\"bbox_2d\": [67, 415, 917, 685], \"label\": \"Table\"}, {\"bbox_2d\": [72, 984, 484, 992], \"label\": \"Text\"}, {\"bbox_2d\": [73, 102, 435, 112], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 24, 156, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 71, 83, 82], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 85, 1009, 121], \"label\": \"Text\"}, {\"bbox_2d\": [22, 140, 1009, 555], \"label\": \"Table\"}, {\"bbox_2d\": [504, 584, 529, 594], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 71, 83, 82], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 87, 1009, 120], \"label\": \"Text\"}, {\"bbox_2d\": [504, 584, 529, 594], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 139, 1009, 557], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [301, 35, 733, 57], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 79, 313, 89], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 93, 994, 162], \"label\": \"Text\"}, {\"bbox_2d\": [25, 170, 1003, 272], \"label\": \"Text\"}, {\"bbox_2d\": [41, 283, 257, 293], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 297, 994, 343], \"label\": \"Text\"}, {\"bbox_2d\": [41, 352, 195, 362], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 367, 1008, 483], \"label\": \"Text\"}, {\"bbox_2d\": [25, 490, 997, 524], \"label\": \"Text\"}, {\"bbox_2d\": [58, 532, 770, 543], \"label\": \"Text\"}, {\"bbox_2d\": [508, 556, 525, 567], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [301, 35, 733, 57], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 78, 314, 89], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 281, 257, 292], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 94, 995, 161], \"label\": \"Text\"}, {\"bbox_2d\": [25, 170, 1007, 272], \"label\": \"Text\"}, {\"bbox_2d\": [25, 296, 992, 341], \"label\": \"Text\"}, {\"bbox_2d\": [25, 367, 1006, 480], \"label\": \"Text\"}, {\"bbox_2d\": [25, 489, 971, 523], \"label\": \"Text\"}, {\"bbox_2d\": [57, 532, 770, 543], \"label\": \"Text\"}, {\"bbox_2d\": [41, 350, 196, 361], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 555, 525, 566], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [323, 100, 707, 113], \"label\": \"Page-header\"}, {\"bbox_2d\": [441, 159, 595, 177], \"label\": \"Section-header\"}, {\"bbox_2d\": [175, 201, 861, 243], \"label\": \"Section-header\"}, {\"bbox_2d\": [164, 266, 871, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [194, 332, 844, 371], \"label\": \"Text\"}, {\"bbox_2d\": [155, 498, 880, 540], \"label\": \"Text\"}, {\"bbox_2d\": [174, 410, 748, 475], \"label\": \"Table\"}, {\"bbox_2d\": [155, 577, 500, 589], \"label\": \"Section-header\"}, {\"bbox_2d\": [174, 601, 880, 629], \"label\": \"Text\"}, {\"bbox_2d\": [155, 636, 880, 664], \"label\": \"Text\"}, {\"bbox_2d\": [174, 672, 551, 683], \"label\": \"Text\"}, {\"bbox_2d\": [155, 721, 880, 749], \"label\": \"Section-header\"}, {\"bbox_2d\": [174, 809, 718, 821], \"label\": \"Text\"}, {\"bbox_2d\": [155, 909, 243, 918], \"label\": \"Footnote\"}, {\"bbox_2d\": [155, 920, 237, 929], \"label\": \"Footnote\"}, {\"bbox_2d\": [187, 829, 475, 841], \"label\": \"List-item\"}, {\"bbox_2d\": [187, 848, 752, 860], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 760, 880, 802], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [324, 102, 706, 114], \"label\": \"Page-header\"}, {\"bbox_2d\": [442, 160, 601, 178], \"label\": \"Title\"}, {\"bbox_2d\": [176, 202, 867, 243], \"label\": \"Title\"}, {\"bbox_2d\": [165, 267, 878, 308], \"label\": \"Title\"}, {\"bbox_2d\": [191, 332, 852, 372], \"label\": \"Text\"}, {\"bbox_2d\": [156, 499, 884, 540], \"label\": \"Text\"}, {\"bbox_2d\": [156, 602, 884, 629], \"label\": \"Text\"}, {\"bbox_2d\": [156, 637, 884, 664], \"label\": \"Text\"}, {\"bbox_2d\": [171, 672, 557, 684], \"label\": \"Text\"}, {\"bbox_2d\": [156, 760, 884, 802], \"label\": \"Text\"}, {\"bbox_2d\": [171, 810, 722, 822], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 830, 479, 842], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 850, 756, 861], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 910, 248, 919], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 921, 241, 930], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 721, 884, 748], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 578, 504, 590], \"label\": \"Section-header\"}, {\"bbox_2d\": [290, 411, 754, 474], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [272, 87, 911, 744], \"label\": \"Table\"}, {\"bbox_2d\": [273, 770, 892, 816], \"label\": \"Text\"}, {\"bbox_2d\": [273, 896, 897, 926], \"label\": \"Text\"}, {\"bbox_2d\": [273, 835, 910, 878], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 379, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [272, 85, 910, 743], \"label\": \"Table\"}, {\"bbox_2d\": [273, 771, 892, 815], \"label\": \"Text\"}, {\"bbox_2d\": [273, 833, 909, 878], \"label\": \"Text\"}, {\"bbox_2d\": [273, 896, 897, 925], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 42, 988, 64], \"label\": \"Text\"}, {\"bbox_2d\": [25, 91, 997, 178], \"label\": \"Text\"}, {\"bbox_2d\": [25, 183, 998, 258], \"label\": \"Text\"}, {\"bbox_2d\": [25, 267, 999, 344], \"label\": \"Text\"}, {\"bbox_2d\": [25, 371, 999, 451], \"label\": \"Text\"}, {\"bbox_2d\": [25, 453, 997, 523], \"label\": \"Text\"}, {\"bbox_2d\": [25, 356, 311, 367], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 75, 220, 86], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 534, 525, 544], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 12, 142, 23], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 74, 221, 85], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 41, 988, 63], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 89, 998, 179], \"label\": \"Text\"}, {\"bbox_2d\": [25, 183, 1003, 262], \"label\": \"Text\"}, {\"bbox_2d\": [25, 266, 1004, 345], \"label\": \"Text\"}, {\"bbox_2d\": [25, 370, 1002, 449], \"label\": \"Text\"}, {\"bbox_2d\": [25, 453, 1004, 521], \"label\": \"Text\"}, {\"bbox_2d\": [25, 355, 310, 366], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 533, 525, 544], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [538, 156, 950, 287], \"label\": \"Text\"}, {\"bbox_2d\": [82, 156, 495, 365], \"label\": \"Text\"}, {\"bbox_2d\": [538, 333, 953, 422], \"label\": \"Text\"}, {\"bbox_2d\": [538, 441, 932, 491], \"label\": \"Text\"}, {\"bbox_2d\": [82, 383, 493, 433], \"label\": \"Text\"}, {\"bbox_2d\": [82, 441, 488, 747], \"label\": \"Text\"}, {\"bbox_2d\": [538, 499, 794, 510], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 519, 718, 530], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 539, 771, 550], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 559, 752, 570], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 579, 725, 590], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 599, 857, 610], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 619, 916, 630], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 639, 860, 650], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 659, 769, 669], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 742, 649, 753], \"label\": \"Caption\"}, {\"bbox_2d\": [538, 765, 848, 775], \"label\": \"Caption\"}, {\"bbox_2d\": [82, 829, 453, 953], \"label\": \"Text\"}, {\"bbox_2d\": [538, 693, 725, 733], \"label\": \"Picture\"}, {\"bbox_2d\": [538, 746, 649, 757], \"label\": \"Caption\"}, {\"bbox_2d\": [538, 310, 786, 326], \"label\": \"Section-header\"}, {\"bbox_2d\": [822, 1004, 953, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [537, 792, 1024, 980], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [538, 156, 954, 284], \"label\": \"Text\"}, {\"bbox_2d\": [81, 156, 495, 363], \"label\": \"Text\"}, {\"bbox_2d\": [81, 382, 495, 745], \"label\": \"Text\"}, {\"bbox_2d\": [538, 333, 954, 422], \"label\": \"Text\"}, {\"bbox_2d\": [538, 440, 933, 490], \"label\": \"Text\"}, {\"bbox_2d\": [538, 519, 718, 529], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 499, 794, 510], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 539, 771, 549], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 558, 752, 569], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 578, 725, 588], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 597, 858, 608], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 617, 916, 628], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 637, 862, 647], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 656, 771, 667], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 745, 849, 775], \"label\": \"Caption\"}, {\"bbox_2d\": [538, 694, 725, 734], \"label\": \"Picture\"}, {\"bbox_2d\": [81, 827, 454, 952], \"label\": \"Text\"}, {\"bbox_2d\": [822, 1005, 924, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [948, 1005, 955, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [515, 791, 1036, 983], \"label\": \"Picture\"}, {\"bbox_2d\": [538, 311, 787, 324], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [340, 42, 960, 74], \"label\": \"Page-header\"}, {\"bbox_2d\": [659, 105, 960, 133], \"label\": \"Text\"}, {\"bbox_2d\": [131, 190, 886, 747], \"label\": \"Table\"}, {\"bbox_2d\": [128, 953, 546, 965], \"label\": \"Text\"}, {\"bbox_2d\": [780, 989, 974, 1000], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [340, 43, 958, 74], \"label\": \"Page-header\"}, {\"bbox_2d\": [781, 990, 974, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [128, 954, 546, 966], \"label\": \"Text\"}, {\"bbox_2d\": [125, 104, 959, 747], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [279, 38, 671, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [44, 974, 172, 986], \"label\": \"Page-footer\"}, {\"bbox_2d\": [598, 857, 785, 867], \"label\": \"Section-header\"}, {\"bbox_2d\": [510, 890, 905, 952], \"label\": \"Text\"}, {\"bbox_2d\": [510, 569, 905, 821], \"label\": \"Text\"}, {\"bbox_2d\": [510, 389, 905, 562], \"label\": \"Text\"}, {\"bbox_2d\": [87, 573, 482, 797], \"label\": \"Text\"}, {\"bbox_2d\": [87, 799, 482, 952], \"label\": \"Text\"}, {\"bbox_2d\": [87, 401, 482, 536], \"label\": \"Caption\"}, {\"bbox_2d\": [509, 288, 905, 354], \"label\": \"Caption\"}, {\"bbox_2d\": [92, 74, 477, 373], \"label\": \"Picture\"}, {\"bbox_2d\": [521, 77, 889, 269], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [88, 572, 481, 795], \"label\": \"Text\"}, {\"bbox_2d\": [88, 800, 481, 948], \"label\": \"Text\"}, {\"bbox_2d\": [517, 75, 902, 271], \"label\": \"Picture\"}, {\"bbox_2d\": [511, 387, 904, 565], \"label\": \"Text\"}, {\"bbox_2d\": [511, 567, 904, 820], \"label\": \"Text\"}, {\"bbox_2d\": [511, 890, 904, 948], \"label\": \"Text\"}, {\"bbox_2d\": [276, 38, 670, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [43, 38, 52, 51], \"label\": \"Page-header\"}, {\"bbox_2d\": [43, 974, 172, 984], \"label\": \"Page-footer\"}, {\"bbox_2d\": [629, 857, 785, 867], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 400, 480, 535], \"label\": \"Caption\"}, {\"bbox_2d\": [511, 287, 903, 354], \"label\": \"Caption\"}, {\"bbox_2d\": [89, 70, 480, 385], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [101, 84, 385, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 235, 424, 256], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 401, 297, 422], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 114, 868, 147], \"label\": \"Text\"}, {\"bbox_2d\": [274, 149, 710, 167], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 169, 388, 186], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 200, 574, 217], \"label\": \"Text\"}, {\"bbox_2d\": [274, 264, 885, 297], \"label\": \"Text\"}, {\"bbox_2d\": [274, 300, 710, 318], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 320, 388, 337], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 351, 908, 383], \"label\": \"Text\"}, {\"bbox_2d\": [274, 430, 894, 463], \"label\": \"Text\"}, {\"bbox_2d\": [274, 465, 710, 483], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 485, 388, 502], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 516, 552, 533], \"label\": \"Text\"}, {\"bbox_2d\": [297, 542, 777, 827], \"label\": \"Text\"}, {\"bbox_2d\": [274, 871, 887, 904], \"label\": \"Text\"}, {\"bbox_2d\": [101, 977, 371, 992], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 89, 385, 106], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 239, 423, 256], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 405, 296, 422], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 875, 886, 903], \"label\": \"Text\"}, {\"bbox_2d\": [273, 118, 867, 146], \"label\": \"Text\"}, {\"bbox_2d\": [273, 204, 573, 217], \"label\": \"Text\"}, {\"bbox_2d\": [273, 268, 886, 297], \"label\": \"Text\"}, {\"bbox_2d\": [273, 354, 908, 383], \"label\": \"Text\"}, {\"bbox_2d\": [273, 433, 892, 462], \"label\": \"Text\"}, {\"bbox_2d\": [273, 520, 551, 533], \"label\": \"Text\"}, {\"bbox_2d\": [297, 544, 776, 826], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 370, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 466, 710, 482], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 488, 388, 501], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 322, 388, 336], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 301, 710, 316], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 151, 710, 166], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 172, 388, 186], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [86, 746, 934, 780], \"label\": \"Section-header\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}, {\"bbox_2d\": [88, 778, 962, 982], \"label\": \"Table\"}, {\"bbox_2d\": [86, 76, 962, 723], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 77, 989, 722], \"label\": \"Table\"}, {\"bbox_2d\": [86, 776, 990, 979], \"label\": \"Table\"}, {\"bbox_2d\": [477, 1003, 592, 1019], \"label\": \"Page-footer\"}, {\"bbox_2d\": [87, 745, 933, 779], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [241, 103, 917, 137], \"label\": \"Text\"}, {\"bbox_2d\": [231, 200, 726, 215], \"label\": \"Text\"}, {\"bbox_2d\": [231, 253, 907, 286], \"label\": \"Text\"}, {\"bbox_2d\": [231, 299, 735, 314], \"label\": \"Text\"}, {\"bbox_2d\": [231, 404, 917, 434], \"label\": \"Text\"}, {\"bbox_2d\": [231, 526, 926, 583], \"label\": \"Text\"}, {\"bbox_2d\": [231, 592, 926, 640], \"label\": \"Text\"}, {\"bbox_2d\": [231, 651, 917, 684], \"label\": \"Text\"}, {\"bbox_2d\": [231, 747, 926, 795], \"label\": \"Text\"}, {\"bbox_2d\": [231, 860, 926, 908], \"label\": \"Text\"}, {\"bbox_2d\": [109, 984, 688, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [109, 713, 536, 731], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 505, 438, 523], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 234, 529, 252], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 167, 439, 185], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 321, 907, 352], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 359, 926, 390], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 441, 560, 456], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 458, 617, 473], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 475, 646, 488], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 800, 813, 816], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 818, 758, 833], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 835, 467, 847], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [241, 104, 916, 135], \"label\": \"Text\"}, {\"bbox_2d\": [231, 203, 721, 215], \"label\": \"Text\"}, {\"bbox_2d\": [231, 253, 908, 285], \"label\": \"Text\"}, {\"bbox_2d\": [231, 301, 730, 313], \"label\": \"Text\"}, {\"bbox_2d\": [231, 406, 916, 433], \"label\": \"Text\"}, {\"bbox_2d\": [231, 529, 926, 556], \"label\": \"Text\"}, {\"bbox_2d\": [231, 595, 926, 639], \"label\": \"Text\"}, {\"bbox_2d\": [241, 654, 916, 682], \"label\": \"Text\"}, {\"bbox_2d\": [231, 750, 926, 793], \"label\": \"Text\"}, {\"bbox_2d\": [231, 864, 926, 907], \"label\": \"Text\"}, {\"bbox_2d\": [231, 835, 463, 847], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 819, 754, 831], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 803, 813, 815], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 567, 646, 578], \"label\": \"Text\"}, {\"bbox_2d\": [231, 475, 645, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 459, 617, 471], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 444, 559, 456], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 323, 908, 351], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 361, 926, 389], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 169, 435, 184], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 237, 523, 251], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 509, 432, 523], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 716, 531, 731], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 987, 685, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [549, 39, 910, 77], \"label\": \"Section-header\"}, {\"bbox_2d\": [20, 38, 280, 79], \"label\": \"Section-header\"}, {\"bbox_2d\": [17, 97, 109, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [20, 283, 260, 302], \"label\": \"Section-header\"}, {\"bbox_2d\": [20, 527, 361, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [546, 526, 825, 562], \"label\": \"Section-header\"}, {\"bbox_2d\": [546, 700, 764, 731], \"label\": \"Section-header\"}, {\"bbox_2d\": [24, 956, 47, 978], \"label\": \"Page-footer\"}, {\"bbox_2d\": [966, 958, 992, 980], \"label\": \"Page-footer\"}, {\"bbox_2d\": [125, 902, 361, 923], \"label\": \"Caption\"}, {\"bbox_2d\": [103, 683, 384, 882], \"label\": \"Picture\"}, {\"bbox_2d\": [24, 121, 210, 256], \"label\": \"List-item\"}, {\"bbox_2d\": [24, 178, 211, 225], \"label\": \"List-item\"}, {\"bbox_2d\": [24, 326, 465, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [24, 367, 435, 506], \"label\": \"List-item\"}, {\"bbox_2d\": [24, 558, 456, 596], \"label\": \"List-item\"}, {\"bbox_2d\": [24, 598, 336, 617], \"label\": \"List-item\"}, {\"bbox_2d\": [37, 620, 447, 657], \"label\": \"Text\"}, {\"bbox_2d\": [544, 754, 991, 937], \"label\": \"Text\"}, {\"bbox_2d\": [544, 585, 991, 684], \"label\": \"Text\"}, {\"bbox_2d\": [246, 95, 455, 256], \"label\": \"Text\"}, {\"bbox_2d\": [544, 119, 1000, 496], \"label\": \"Picture\"}, {\"bbox_2d\": [300, 948, 468, 988], \"label\": \"Picture\"}, {\"bbox_2d\": [542, 948, 713, 989], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [20, 39, 282, 77], \"label\": \"Section-header\"}, {\"bbox_2d\": [15, 95, 107, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [247, 95, 340, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [18, 282, 259, 303], \"label\": \"Section-header\"}, {\"bbox_2d\": [18, 526, 361, 546], \"label\": \"Section-header\"}, {\"bbox_2d\": [547, 525, 825, 561], \"label\": \"Section-header\"}, {\"bbox_2d\": [549, 696, 765, 732], \"label\": \"Section-header\"}, {\"bbox_2d\": [975, 956, 984, 977], \"label\": \"Page-footer\"}, {\"bbox_2d\": [30, 956, 40, 977], \"label\": \"Page-footer\"}, {\"bbox_2d\": [17, 117, 200, 180], \"label\": \"List-item\"}, {\"bbox_2d\": [17, 178, 223, 256], \"label\": \"List-item\"}, {\"bbox_2d\": [248, 117, 431, 180], \"label\": \"List-item\"}, {\"bbox_2d\": [248, 178, 454, 256], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 323, 465, 363], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 363, 433, 404], \"label\": \"List-item\"}, {\"bbox_2d\": [37, 404, 445, 506], \"label\": \"Text\"}, {\"bbox_2d\": [18, 555, 454, 595], \"label\": \"List-item\"}, {\"bbox_2d\": [18, 595, 335, 616], \"label\": \"List-item\"}, {\"bbox_2d\": [37, 616, 446, 656], \"label\": \"Text\"}, {\"bbox_2d\": [545, 582, 997, 683], \"label\": \"Text\"}, {\"bbox_2d\": [545, 753, 993, 935], \"label\": \"Text\"}, {\"bbox_2d\": [124, 901, 361, 921], \"label\": \"Caption\"}, {\"bbox_2d\": [103, 683, 384, 883], \"label\": \"Picture\"}, {\"bbox_2d\": [551, 120, 1004, 496], \"label\": \"Picture\"}, {\"bbox_2d\": [551, 40, 911, 77], \"label\": \"Section-header\"}, {\"bbox_2d\": [299, 947, 470, 986], \"label\": \"Picture\"}, {\"bbox_2d\": [544, 948, 714, 988], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [0, 0, 1035, 1035], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [0, -2, 1036, 1036], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [429, 29, 613, 39], \"label\": \"Page-header\"}, {\"bbox_2d\": [148, 87, 931, 192], \"label\": \"Text\"}, {\"bbox_2d\": [148, 220, 853, 258], \"label\": \"Text\"}, {\"bbox_2d\": [148, 510, 828, 551], \"label\": \"Text\"}, {\"bbox_2d\": [148, 603, 902, 686], \"label\": \"Text\"}, {\"bbox_2d\": [148, 737, 907, 819], \"label\": \"Text\"}, {\"bbox_2d\": [148, 869, 910, 907], \"label\": \"Text\"}, {\"bbox_2d\": [148, 848, 415, 862], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 580, 780, 596], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 714, 787, 729], \"label\": \"Section-header\"}, {\"bbox_2d\": [142, 281, 917, 491], \"label\": \"Picture\"}, {\"bbox_2d\": [438, 999, 602, 1007], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [429, 26, 614, 40], \"label\": \"Page-header\"}, {\"bbox_2d\": [147, 82, 934, 192], \"label\": \"Text\"}, {\"bbox_2d\": [147, 215, 861, 259], \"label\": \"Text\"}, {\"bbox_2d\": [147, 503, 838, 552], \"label\": \"Text\"}, {\"bbox_2d\": [147, 599, 910, 687], \"label\": \"Text\"}, {\"bbox_2d\": [147, 732, 915, 820], \"label\": \"Text\"}, {\"bbox_2d\": [147, 865, 917, 909], \"label\": \"Text\"}, {\"bbox_2d\": [438, 996, 602, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [147, 709, 787, 731], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 842, 415, 864], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 574, 781, 599], \"label\": \"Section-header\"}, {\"bbox_2d\": [140, 279, 918, 491], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [333, 83, 910, 130], \"label\": \"Text\"}, {\"bbox_2d\": [333, 138, 469, 154], \"label\": \"Text\"}, {\"bbox_2d\": [394, 154, 744, 167], \"label\": \"Text\"}, {\"bbox_2d\": [394, 179, 633, 192], \"label\": \"Text\"}, {\"bbox_2d\": [457, 219, 904, 261], \"label\": \"Text\"}, {\"bbox_2d\": [394, 273, 484, 286], \"label\": \"Text\"}, {\"bbox_2d\": [455, 288, 864, 316], \"label\": \"Text\"}, {\"bbox_2d\": [333, 375, 895, 403], \"label\": \"Text\"}, {\"bbox_2d\": [333, 415, 516, 428], \"label\": \"Text\"}, {\"bbox_2d\": [394, 492, 910, 520], \"label\": \"Text\"}, {\"bbox_2d\": [394, 532, 633, 545], \"label\": \"Text\"}, {\"bbox_2d\": [394, 557, 760, 567], \"label\": \"Text\"}, {\"bbox_2d\": [394, 579, 468, 591], \"label\": \"Text\"}, {\"bbox_2d\": [455, 595, 894, 638], \"label\": \"Text\"}, {\"bbox_2d\": [394, 649, 484, 662], \"label\": \"Text\"}, {\"bbox_2d\": [455, 665, 788, 678], \"label\": \"Text\"}, {\"bbox_2d\": [333, 735, 818, 747], \"label\": \"Text\"}, {\"bbox_2d\": [333, 759, 516, 771], \"label\": \"Text\"}, {\"bbox_2d\": [394, 838, 881, 865], \"label\": \"Text\"}, {\"bbox_2d\": [394, 891, 774, 904], \"label\": \"Text\"}, {\"bbox_2d\": [394, 916, 554, 928], \"label\": \"Text\"}, {\"bbox_2d\": [333, 820, 412, 832], \"label\": \"Section-header\"}, {\"bbox_2d\": [333, 782, 888, 810], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 916, 554, 928], \"label\": \"Section-header\"}, {\"bbox_2d\": [394, 503, 398, 515], \"label\": \"Text\"}, {\"bbox_2d\": [333, 436, 893, 465], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 351, 400, 363], \"label\": \"Section-header\"}, {\"bbox_2d\": [394, 201, 468, 214], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 712, 440, 724], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 981, 379, 992], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [334, 85, 910, 129], \"label\": \"List-item\"}, {\"bbox_2d\": [334, 140, 469, 153], \"label\": \"Section-header\"}, {\"bbox_2d\": [395, 202, 468, 215], \"label\": \"Section-header\"}, {\"bbox_2d\": [395, 273, 483, 286], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 351, 399, 364], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 476, 397, 490], \"label\": \"Section-header\"}, {\"bbox_2d\": [395, 578, 468, 591], \"label\": \"Section-header\"}, {\"bbox_2d\": [395, 649, 483, 662], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 711, 440, 724], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 821, 412, 834], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 915, 554, 928], \"label\": \"Section-header\"}, {\"bbox_2d\": [334, 876, 430, 889], \"label\": \"Section-header\"}, {\"bbox_2d\": [395, 155, 743, 192], \"label\": \"Text\"}, {\"bbox_2d\": [456, 218, 904, 263], \"label\": \"Text\"}, {\"bbox_2d\": [456, 288, 864, 317], \"label\": \"Text\"}, {\"bbox_2d\": [334, 375, 894, 404], \"label\": \"Text\"}, {\"bbox_2d\": [334, 414, 516, 427], \"label\": \"Text\"}, {\"bbox_2d\": [395, 492, 908, 521], \"label\": \"Text\"}, {\"bbox_2d\": [395, 531, 632, 544], \"label\": \"Text\"}, {\"bbox_2d\": [395, 555, 759, 568], \"label\": \"Text\"}, {\"bbox_2d\": [456, 594, 892, 638], \"label\": \"Text\"}, {\"bbox_2d\": [456, 664, 787, 678], \"label\": \"Text\"}, {\"bbox_2d\": [334, 735, 818, 772], \"label\": \"Text\"}, {\"bbox_2d\": [395, 837, 881, 866], \"label\": \"Text\"}, {\"bbox_2d\": [395, 891, 774, 905], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [334, 782, 887, 811], \"label\": \"List-item\"}, {\"bbox_2d\": [334, 437, 891, 466], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [904, 62, 916, 73], \"label\": \"Page-header\"}, {\"bbox_2d\": [26, 238, 61, 710], \"label\": \"Page-header\"}, {\"bbox_2d\": [168, 93, 866, 241], \"label\": \"Title\"}, {\"bbox_2d\": [215, 277, 818, 290], \"label\": \"Text\"}, {\"bbox_2d\": [151, 314, 884, 365], \"label\": \"Text\"}, {\"bbox_2d\": [133, 387, 902, 401], \"label\": \"Text\"}, {\"bbox_2d\": [368, 423, 668, 438], \"label\": \"Text\"}, {\"bbox_2d\": [125, 461, 910, 475], \"label\": \"Text\"}, {\"bbox_2d\": [162, 604, 875, 901], \"label\": \"Text\"}, {\"bbox_2d\": [486, 564, 550, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [869, 958, 913, 969], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 958, 210, 969], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [31, 239, 61, 712], \"label\": \"Page-header\"}, {\"bbox_2d\": [170, 94, 865, 243], \"label\": \"Title\"}, {\"bbox_2d\": [216, 276, 818, 292], \"label\": \"Text\"}, {\"bbox_2d\": [144, 312, 891, 365], \"label\": \"Text\"}, {\"bbox_2d\": [134, 386, 901, 439], \"label\": \"Text\"}, {\"bbox_2d\": [125, 459, 910, 475], \"label\": \"Text\"}, {\"bbox_2d\": [162, 604, 874, 901], \"label\": \"Text\"}, {\"bbox_2d\": [486, 564, 549, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [907, 63, 914, 72], \"label\": \"Page-header\"}, {\"bbox_2d\": [869, 959, 914, 968], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 959, 210, 968], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [273, 84, 774, 98], \"label\": \"Text\"}, {\"bbox_2d\": [273, 109, 591, 121], \"label\": \"Caption\"}, {\"bbox_2d\": [101, 979, 379, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [274, 122, 914, 892], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 774, 98], \"label\": \"Text\"}, {\"bbox_2d\": [272, 110, 593, 121], \"label\": \"Caption\"}, {\"bbox_2d\": [273, 122, 914, 895], \"label\": \"Table\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [43, 72, 335, 103], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 117, 461, 321], \"label\": \"Text\"}, {\"bbox_2d\": [43, 325, 461, 509], \"label\": \"Text\"}, {\"bbox_2d\": [43, 512, 461, 830], \"label\": \"Text\"}, {\"bbox_2d\": [43, 834, 461, 990], \"label\": \"Text\"}, {\"bbox_2d\": [496, 117, 916, 648], \"label\": \"Text\"}, {\"bbox_2d\": [496, 652, 916, 851], \"label\": \"Text\"}, {\"bbox_2d\": [496, 856, 916, 990], \"label\": \"Text\"}, {\"bbox_2d\": [464, 1002, 496, 1019], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [44, 72, 344, 103], \"label\": \"Page-header\"}, {\"bbox_2d\": [496, 114, 924, 244], \"label\": \"Text\"}, {\"bbox_2d\": [42, 115, 473, 320], \"label\": \"Text\"}, {\"bbox_2d\": [492, 246, 928, 645], \"label\": \"Text\"}, {\"bbox_2d\": [43, 321, 471, 507], \"label\": \"Text\"}, {\"bbox_2d\": [44, 508, 476, 826], \"label\": \"Text\"}, {\"bbox_2d\": [500, 648, 922, 846], \"label\": \"Text\"}, {\"bbox_2d\": [499, 849, 933, 987], \"label\": \"Text\"}, {\"bbox_2d\": [46, 829, 470, 986], \"label\": \"Text\"}, {\"bbox_2d\": [466, 997, 498, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 35, 410, 89], \"label\": \"Picture\"}, {\"bbox_2d\": [155, 163, 904, 251], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 269, 904, 370], \"label\": \"List-item\"}, {\"bbox_2d\": [155, 388, 904, 430], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 657, 904, 699], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 716, 904, 744], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 761, 904, 803], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 820, 904, 848], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 865, 904, 907], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 925, 904, 953], \"label\": \"List-item\"}, {\"bbox_2d\": [132, 456, 904, 483], \"label\": \"Text\"}, {\"bbox_2d\": [132, 499, 904, 557], \"label\": \"Text\"}, {\"bbox_2d\": [132, 574, 904, 602], \"label\": \"Text\"}, {\"bbox_2d\": [132, 619, 269, 632], \"label\": \"Section-header\"}, {\"bbox_2d\": [533, 971, 552, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [153, 161, 909, 251], \"label\": \"List-item\"}, {\"bbox_2d\": [153, 266, 908, 370], \"label\": \"List-item\"}, {\"bbox_2d\": [153, 385, 908, 430], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 654, 909, 699], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 714, 907, 744], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 759, 908, 804], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 819, 908, 849], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 864, 908, 909], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 924, 908, 953], \"label\": \"List-item\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [133, 617, 275, 632], \"label\": \"Section-header\"}, {\"bbox_2d\": [133, 453, 908, 482], \"label\": \"Text\"}, {\"bbox_2d\": [133, 497, 908, 557], \"label\": \"Text\"}, {\"bbox_2d\": [133, 572, 908, 602], \"label\": \"Text\"}, {\"bbox_2d\": [122, 34, 410, 86], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 23, 156, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [73, 79, 1009, 103], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 112, 1009, 135], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 143, 815, 154], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 163, 1009, 186], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 195, 586, 204], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 213, 1009, 247], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 256, 1009, 290], \"label\": \"Text\"}, {\"bbox_2d\": [25, 299, 1009, 390], \"label\": \"Text\"}, {\"bbox_2d\": [25, 397, 1009, 455], \"label\": \"Text\"}, {\"bbox_2d\": [25, 464, 1009, 509], \"label\": \"Text\"}, {\"bbox_2d\": [25, 518, 1009, 652], \"label\": \"Text\"}, {\"bbox_2d\": [508, 675, 525, 686], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [73, 80, 1009, 102], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 111, 1009, 133], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 142, 816, 153], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 162, 1009, 184], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 193, 587, 204], \"label\": \"List-item\"}, {\"bbox_2d\": [73, 213, 1009, 246], \"label\": \"List-item\"}, {\"bbox_2d\": [25, 255, 1009, 289], \"label\": \"Text\"}, {\"bbox_2d\": [25, 298, 1009, 388], \"label\": \"Text\"}, {\"bbox_2d\": [25, 397, 1009, 453], \"label\": \"Text\"}, {\"bbox_2d\": [25, 462, 1009, 507], \"label\": \"Text\"}, {\"bbox_2d\": [25, 516, 1009, 652], \"label\": \"Text\"}, {\"bbox_2d\": [508, 675, 525, 686], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [934, 37, 951, 48], \"label\": \"Page-header\"}, {\"bbox_2d\": [540, 69, 948, 553], \"label\": \"Picture\"}, {\"bbox_2d\": [95, 68, 501, 334], \"label\": \"Picture\"}, {\"bbox_2d\": [95, 366, 506, 444], \"label\": \"Caption\"}, {\"bbox_2d\": [95, 708, 506, 749], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 581, 951, 631], \"label\": \"Caption\"}, {\"bbox_2d\": [95, 845, 506, 945], \"label\": \"Text\"}, {\"bbox_2d\": [536, 676, 951, 731], \"label\": \"Text\"}, {\"bbox_2d\": [536, 735, 951, 845], \"label\": \"Text\"}, {\"bbox_2d\": [536, 906, 951, 945], \"label\": \"Text\"}, {\"bbox_2d\": [599, 870, 887, 882], \"label\": \"Section-header\"}, {\"bbox_2d\": [99, 796, 497, 821], \"label\": \"Section-header\"}, {\"bbox_2d\": [95, 475, 521, 681], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [84, 67, 497, 343], \"label\": \"Picture\"}, {\"bbox_2d\": [536, 580, 951, 632], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 366, 506, 445], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 710, 506, 748], \"label\": \"Caption\"}, {\"bbox_2d\": [94, 471, 534, 695], \"label\": \"Picture\"}, {\"bbox_2d\": [536, 676, 951, 732], \"label\": \"Text\"}, {\"bbox_2d\": [536, 736, 951, 838], \"label\": \"Text\"}, {\"bbox_2d\": [536, 904, 951, 946], \"label\": \"Text\"}, {\"bbox_2d\": [91, 844, 506, 946], \"label\": \"Text\"}, {\"bbox_2d\": [100, 796, 497, 820], \"label\": \"Section-header\"}, {\"bbox_2d\": [601, 871, 887, 882], \"label\": \"Section-header\"}, {\"bbox_2d\": [934, 38, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [537, 67, 948, 558], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [540, 135, 906, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [1016, 135, 1026, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [513, 234, 884, 269], \"label\": \"Text\"}, {\"bbox_2d\": [87, 234, 479, 335], \"label\": \"Text\"}, {\"bbox_2d\": [87, 369, 477, 537], \"label\": \"Text\"}, {\"bbox_2d\": [87, 571, 479, 626], \"label\": \"Text\"}, {\"bbox_2d\": [521, 297, 908, 512], \"label\": \"Table\"}, {\"bbox_2d\": [96, 652, 467, 876], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [1016, 137, 1028, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [539, 132, 907, 156], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 229, 480, 337], \"label\": \"Text\"}, {\"bbox_2d\": [512, 229, 885, 270], \"label\": \"Text\"}, {\"bbox_2d\": [87, 364, 477, 538], \"label\": \"Text\"}, {\"bbox_2d\": [87, 565, 479, 628], \"label\": \"Text\"}, {\"bbox_2d\": [97, 652, 470, 873], \"label\": \"Table\"}, {\"bbox_2d\": [523, 296, 908, 508], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [124, 164, 823, 203], \"label\": \"Section-header\"}, {\"bbox_2d\": [840, 634, 992, 702], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 814, 1033, 965], \"label\": \"Section-header\"}, {\"bbox_2d\": [107, 271, 355, 502], \"label\": \"Picture\"}, {\"bbox_2d\": [108, 532, 355, 724], \"label\": \"Picture\"}, {\"bbox_2d\": [841, 269, 991, 616], \"label\": \"Picture\"}, {\"bbox_2d\": [108, 742, 340, 781], \"label\": \"Caption\"}, {\"bbox_2d\": [381, 275, 813, 424], \"label\": \"Text\"}, {\"bbox_2d\": [381, 430, 812, 562], \"label\": \"Text\"}, {\"bbox_2d\": [381, 568, 813, 603], \"label\": \"Text\"}, {\"bbox_2d\": [381, 617, 813, 788], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [125, 165, 828, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [375, 269, 813, 423], \"label\": \"Text\"}, {\"bbox_2d\": [375, 429, 809, 535], \"label\": \"Text\"}, {\"bbox_2d\": [375, 541, 815, 602], \"label\": \"Text\"}, {\"bbox_2d\": [375, 616, 812, 789], \"label\": \"Text\"}, {\"bbox_2d\": [841, 632, 991, 702], \"label\": \"Caption\"}, {\"bbox_2d\": [107, 738, 343, 779], \"label\": \"Caption\"}, {\"bbox_2d\": [104, 264, 359, 729], \"label\": \"Picture\"}, {\"bbox_2d\": [838, 264, 996, 622], \"label\": \"Picture\"}, {\"bbox_2d\": [881, 140, 997, 220], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [783, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [131, 322, 448, 333], \"label\": \"Text\"}, {\"bbox_2d\": [131, 342, 501, 422], \"label\": \"Text\"}, {\"bbox_2d\": [130, 913, 498, 933], \"label\": \"Text\"}, {\"bbox_2d\": [131, 944, 500, 982], \"label\": \"Text\"}, {\"bbox_2d\": [553, 154, 853, 284], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 314, 853, 465], \"label\": \"Picture\"}, {\"bbox_2d\": [551, 496, 853, 973], \"label\": \"Picture\"}, {\"bbox_2d\": [157, 453, 482, 886], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 141, 483, 291], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [779, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 525, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 322, 444, 333], \"label\": \"Text\"}, {\"bbox_2d\": [130, 341, 503, 421], \"label\": \"Text\"}, {\"bbox_2d\": [130, 909, 498, 934], \"label\": \"Text\"}, {\"bbox_2d\": [131, 944, 499, 982], \"label\": \"Text\"}, {\"bbox_2d\": [158, 142, 481, 293], \"label\": \"Picture\"}, {\"bbox_2d\": [159, 447, 482, 881], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 153, 855, 969], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 286, 918, 331], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 359, 506, 576], \"label\": \"Text\"}, {\"bbox_2d\": [91, 578, 506, 730], \"label\": \"Text\"}, {\"bbox_2d\": [91, 779, 412, 797], \"label\": \"Text\"}, {\"bbox_2d\": [91, 800, 506, 969], \"label\": \"Text\"}, {\"bbox_2d\": [536, 361, 918, 513], \"label\": \"Text\"}, {\"bbox_2d\": [536, 516, 918, 590], \"label\": \"Text\"}, {\"bbox_2d\": [536, 609, 594, 623], \"label\": \"Section-header\"}, {\"bbox_2d\": [541, 637, 918, 662], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 666, 918, 711], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 715, 918, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 762, 918, 805], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 809, 918, 842], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 847, 918, 880], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 884, 918, 939], \"label\": \"List-item\"}, {\"bbox_2d\": [491, 994, 498, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [197, 740, 506, 776], \"label\": \"Formula\"}, {\"bbox_2d\": [148, 67, 481, 272], \"label\": \"Picture\"}, {\"bbox_2d\": [543, 69, 870, 294], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [91, 286, 944, 330], \"label\": \"Caption\"}, {\"bbox_2d\": [146, 66, 874, 267], \"label\": \"Picture\"}, {\"bbox_2d\": [533, 359, 944, 513], \"label\": \"Text\"}, {\"bbox_2d\": [91, 359, 503, 575], \"label\": \"Text\"}, {\"bbox_2d\": [533, 515, 944, 591], \"label\": \"Text\"}, {\"bbox_2d\": [91, 578, 502, 732], \"label\": \"Text\"}, {\"bbox_2d\": [533, 612, 624, 624], \"label\": \"Section-header\"}, {\"bbox_2d\": [545, 635, 944, 661], \"label\": \"List-item\"}, {\"bbox_2d\": [545, 665, 944, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [545, 709, 944, 763], \"label\": \"List-item\"}, {\"bbox_2d\": [545, 767, 944, 807], \"label\": \"List-item\"}, {\"bbox_2d\": [545, 810, 944, 851], \"label\": \"List-item\"}, {\"bbox_2d\": [545, 854, 944, 880], \"label\": \"List-item\"}, {\"bbox_2d\": [545, 884, 944, 924], \"label\": \"List-item\"}, {\"bbox_2d\": [545, 927, 944, 968], \"label\": \"List-item\"}, {\"bbox_2d\": [513, 994, 522, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [192, 733, 502, 772], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 783, 410, 796], \"label\": \"Text\"}, {\"bbox_2d\": [91, 799, 502, 968], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [150, 92, 910, 156], \"label\": \"Text\"}, {\"bbox_2d\": [150, 169, 910, 233], \"label\": \"Text\"}, {\"bbox_2d\": [150, 246, 910, 309], \"label\": \"Text\"}, {\"bbox_2d\": [150, 321, 910, 411], \"label\": \"Text\"}, {\"bbox_2d\": [150, 424, 910, 614], \"label\": \"Text\"}, {\"bbox_2d\": [150, 627, 910, 742], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [151, 92, 910, 156], \"label\": \"Text\"}, {\"bbox_2d\": [150, 169, 910, 233], \"label\": \"Text\"}, {\"bbox_2d\": [150, 245, 912, 309], \"label\": \"Text\"}, {\"bbox_2d\": [150, 322, 910, 411], \"label\": \"Text\"}, {\"bbox_2d\": [150, 423, 911, 614], \"label\": \"Text\"}, {\"bbox_2d\": [150, 627, 911, 742], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 83, 417, 99], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 101, 901, 129], \"label\": \"Text\"}, {\"bbox_2d\": [274, 459, 889, 503], \"label\": \"Text\"}, {\"bbox_2d\": [274, 513, 910, 806], \"label\": \"Picture\"}, {\"bbox_2d\": [101, 979, 379, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [274, 135, 910, 432], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [273, 85, 416, 99], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 101, 900, 129], \"label\": \"Text\"}, {\"bbox_2d\": [273, 457, 890, 504], \"label\": \"Text\"}, {\"bbox_2d\": [271, 133, 911, 433], \"label\": \"Table\"}, {\"bbox_2d\": [271, 507, 912, 807], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [114, 62, 313, 88], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 979, 563, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [327, 897, 774, 911], \"label\": \"Caption\"}, {\"bbox_2d\": [147, 114, 911, 867], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [329, 899, 786, 914], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 975, 568, 990], \"label\": \"Page-footer\"}, {\"bbox_2d\": [113, 61, 908, 866], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [152, 83, 815, 105], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 127, 591, 150], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 168, 357, 180], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 198, 616, 210], \"label\": \"Text\"}, {\"bbox_2d\": [152, 287, 568, 301], \"label\": \"Text\"}, {\"bbox_2d\": [152, 465, 474, 477], \"label\": \"Text\"}, {\"bbox_2d\": [152, 794, 648, 806], \"label\": \"Text\"}, {\"bbox_2d\": [152, 824, 935, 853], \"label\": \"Text\"}, {\"bbox_2d\": [152, 893, 944, 976], \"label\": \"Picture\"}, {\"bbox_2d\": [341, 224, 927, 451], \"label\": \"Picture\"}, {\"bbox_2d\": [152, 224, 927, 276], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [152, 85, 815, 104], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 290, 566, 302], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 466, 473, 478], \"label\": \"Text\"}, {\"bbox_2d\": [152, 127, 591, 151], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 168, 357, 180], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 198, 615, 210], \"label\": \"Section-header\"}, {\"bbox_2d\": [151, 225, 925, 272], \"label\": \"Table\"}, {\"bbox_2d\": [150, 317, 925, 447], \"label\": \"Table\"}, {\"bbox_2d\": [152, 795, 647, 807], \"label\": \"Text\"}, {\"bbox_2d\": [152, 826, 935, 854], \"label\": \"Text\"}, {\"bbox_2d\": [152, 895, 192, 907], \"label\": \"Text\"}, {\"bbox_2d\": [152, 931, 331, 943], \"label\": \"Text\"}, {\"bbox_2d\": [152, 966, 224, 978], \"label\": \"Text\"}, {\"bbox_2d\": [557, 895, 613, 907], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [695, 94, 727, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 94, 334, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [282, 113, 351, 123], \"label\": \"Text\"}, {\"bbox_2d\": [163, 127, 475, 257], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 127, 817, 294], \"label\": \"Picture\"}, {\"bbox_2d\": [134, 327, 493, 398], \"label\": \"Picture\"}, {\"bbox_2d\": [184, 449, 404, 568], \"label\": \"Picture\"}, {\"bbox_2d\": [185, 579, 373, 612], \"label\": \"Picture\"}, {\"bbox_2d\": [165, 642, 422, 732], \"label\": \"Picture\"}, {\"bbox_2d\": [165, 750, 424, 842], \"label\": \"Picture\"}, {\"bbox_2d\": [145, 868, 444, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [546, 844, 818, 969], \"label\": \"Picture\"}, {\"bbox_2d\": [536, 716, 817, 832], \"label\": \"Picture\"}, {\"bbox_2d\": [536, 461, 817, 716], \"label\": \"Picture\"}, {\"bbox_2d\": [535, 342, 896, 418], \"label\": \"Picture\"}, {\"bbox_2d\": [557, 262, 817, 287], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [299, 93, 334, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [438, 73, 587, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [694, 93, 727, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 113, 351, 123], \"label\": \"Text\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [159, 128, 500, 257], \"label\": \"Picture\"}, {\"bbox_2d\": [137, 308, 499, 397], \"label\": \"Picture\"}, {\"bbox_2d\": [184, 447, 500, 612], \"label\": \"Picture\"}, {\"bbox_2d\": [169, 647, 499, 841], \"label\": \"Picture\"}, {\"bbox_2d\": [149, 849, 499, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [558, 130, 894, 295], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 345, 892, 413], \"label\": \"Picture\"}, {\"bbox_2d\": [539, 463, 894, 701], \"label\": \"Picture\"}, {\"bbox_2d\": [563, 710, 894, 835], \"label\": \"Picture\"}, {\"bbox_2d\": [565, 847, 894, 962], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [419, 77, 619, 91], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 126, 937, 158], \"label\": \"Text\"}, {\"bbox_2d\": [98, 396, 672, 411], \"label\": \"Text\"}, {\"bbox_2d\": [98, 421, 223, 436], \"label\": \"Text\"}, {\"bbox_2d\": [98, 553, 937, 635], \"label\": \"Text\"}, {\"bbox_2d\": [98, 166, 482, 180], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 201, 447, 215], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 234, 450, 248], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 267, 493, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 300, 493, 314], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 333, 563, 347], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 366, 613, 380], \"label\": \"List-item\"}, {\"bbox_2d\": [130, 447, 937, 477], \"label\": \"List-item\"}, {\"bbox_2d\": [130, 488, 937, 519], \"label\": \"List-item\"}, {\"bbox_2d\": [130, 530, 654, 544], \"label\": \"List-item\"}, {\"bbox_2d\": [926, 961, 937, 1000], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [417, 73, 623, 93], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 121, 946, 158], \"label\": \"Text\"}, {\"bbox_2d\": [98, 393, 678, 413], \"label\": \"Text\"}, {\"bbox_2d\": [98, 417, 229, 437], \"label\": \"Text\"}, {\"bbox_2d\": [98, 549, 946, 636], \"label\": \"Text\"}, {\"bbox_2d\": [895, 956, 942, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 163, 486, 183], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 195, 452, 216], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 228, 454, 249], \"label\": \"List-item\"}, {\"bbox_2d\": [100, 261, 497, 282], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 294, 497, 314], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 327, 567, 347], \"label\": \"List-item\"}, {\"bbox_2d\": [100, 360, 618, 380], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 441, 945, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 483, 947, 520], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 524, 659, 544], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [292, 84, 565, 97], \"label\": \"Text\"}, {\"bbox_2d\": [292, 428, 888, 441], \"label\": \"Text\"}, {\"bbox_2d\": [292, 460, 907, 488], \"label\": \"Text\"}, {\"bbox_2d\": [292, 507, 718, 520], \"label\": \"Text\"}, {\"bbox_2d\": [448, 643, 611, 655], \"label\": \"Text\"}, {\"bbox_2d\": [509, 702, 679, 714], \"label\": \"Text\"}, {\"bbox_2d\": [509, 736, 738, 749], \"label\": \"Text\"}, {\"bbox_2d\": [509, 753, 895, 781], \"label\": \"Text\"}, {\"bbox_2d\": [509, 827, 926, 866], \"label\": \"Text\"}, {\"bbox_2d\": [509, 893, 887, 932], \"label\": \"Text\"}, {\"bbox_2d\": [509, 876, 545, 888], \"label\": \"Text\"}, {\"bbox_2d\": [292, 908, 341, 920], \"label\": \"Text\"}, {\"bbox_2d\": [292, 753, 344, 765], \"label\": \"Text\"}, {\"bbox_2d\": [292, 789, 853, 802], \"label\": \"Text\"}, {\"bbox_2d\": [292, 811, 513, 824], \"label\": \"Text\"}, {\"bbox_2d\": [292, 725, 585, 737], \"label\": \"Text\"}, {\"bbox_2d\": [292, 664, 907, 691], \"label\": \"Text\"}, {\"bbox_2d\": [292, 601, 924, 628], \"label\": \"Text\"}, {\"bbox_2d\": [292, 548, 933, 938], \"label\": \"Table\"}, {\"bbox_2d\": [292, 532, 543, 543], \"label\": \"Caption\"}, {\"bbox_2d\": [629, 979, 932, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [289, 105, 930, 402], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 560, 98], \"label\": \"Text\"}, {\"bbox_2d\": [293, 428, 885, 441], \"label\": \"Text\"}, {\"bbox_2d\": [293, 459, 901, 488], \"label\": \"Text\"}, {\"bbox_2d\": [293, 506, 717, 520], \"label\": \"Text\"}, {\"bbox_2d\": [630, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [293, 531, 544, 542], \"label\": \"Caption\"}, {\"bbox_2d\": [803, 356, 844, 368], \"label\": \"Text\"}, {\"bbox_2d\": [317, 368, 540, 379], \"label\": \"Text\"}, {\"bbox_2d\": [317, 168, 493, 180], \"label\": \"Text\"}, {\"bbox_2d\": [533, 109, 621, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [317, 132, 595, 157], \"label\": \"Table\"}, {\"bbox_2d\": [331, 191, 797, 239], \"label\": \"Table\"}, {\"bbox_2d\": [292, 543, 932, 938], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 64, 493, 543], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 577, 506, 658], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [129, 64, 490, 542], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 578, 506, 658], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [86, 75, 990, 514], \"label\": \"Table\"}, {\"bbox_2d\": [89, 520, 380, 531], \"label\": \"Text\"}, {\"bbox_2d\": [125, 532, 947, 605], \"label\": \"Text\"}, {\"bbox_2d\": [125, 607, 946, 715], \"label\": \"Text\"}, {\"bbox_2d\": [125, 717, 946, 770], \"label\": \"Text\"}, {\"bbox_2d\": [125, 821, 964, 941], \"label\": \"Text\"}, {\"bbox_2d\": [125, 943, 856, 969], \"label\": \"Text\"}, {\"bbox_2d\": [481, 1003, 589, 1019], \"label\": \"Page-footer\"}, {\"bbox_2d\": [501, 771, 598, 800], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [89, 78, 988, 512], \"label\": \"Table\"}, {\"bbox_2d\": [124, 530, 972, 608], \"label\": \"Text\"}, {\"bbox_2d\": [124, 603, 955, 712], \"label\": \"Text\"}, {\"bbox_2d\": [124, 707, 962, 771], \"label\": \"Text\"}, {\"bbox_2d\": [124, 819, 964, 853], \"label\": \"Text\"}, {\"bbox_2d\": [124, 848, 943, 942], \"label\": \"Text\"}, {\"bbox_2d\": [124, 937, 857, 971], \"label\": \"Text\"}, {\"bbox_2d\": [525, 773, 592, 802], \"label\": \"Picture\"}, {\"bbox_2d\": [481, 1003, 589, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [819, 42, 985, 54], \"label\": \"Page-header\"}, {\"bbox_2d\": [831, 190, 925, 203], \"label\": \"Text\"}, {\"bbox_2d\": [103, 132, 489, 150], \"label\": \"Section-header\"}, {\"bbox_2d\": [103, 223, 564, 241], \"label\": \"Section-header\"}, {\"bbox_2d\": [592, 223, 838, 241], \"label\": \"Section-header\"}, {\"bbox_2d\": [103, 301, 397, 318], \"label\": \"Section-header\"}, {\"bbox_2d\": [103, 887, 710, 903], \"label\": \"Footnote\"}, {\"bbox_2d\": [103, 941, 228, 971], \"label\": \"Page-footer\"}, {\"bbox_2d\": [866, 941, 916, 953], \"label\": \"Page-footer\"}, {\"bbox_2d\": [152, 331, 920, 839], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [839, 42, 985, 55], \"label\": \"Page-header\"}, {\"bbox_2d\": [104, 124, 490, 151], \"label\": \"Section-header\"}, {\"bbox_2d\": [104, 216, 844, 243], \"label\": \"Section-header\"}, {\"bbox_2d\": [104, 294, 402, 321], \"label\": \"Section-header\"}, {\"bbox_2d\": [830, 184, 924, 205], \"label\": \"Text\"}, {\"bbox_2d\": [151, 331, 919, 839], \"label\": \"Table\"}, {\"bbox_2d\": [104, 885, 714, 904], \"label\": \"Footnote\"}, {\"bbox_2d\": [104, 937, 232, 973], \"label\": \"Page-footer\"}, {\"bbox_2d\": [865, 937, 918, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [255, 94, 927, 431], \"label\": \"Table\"}, {\"bbox_2d\": [109, 457, 769, 474], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 677, 635, 691], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 492, 926, 520], \"label\": \"Text\"}, {\"bbox_2d\": [231, 800, 926, 862], \"label\": \"Text\"}, {\"bbox_2d\": [231, 878, 888, 922], \"label\": \"Text\"}, {\"bbox_2d\": [231, 711, 926, 785], \"label\": \"Text\"}, {\"bbox_2d\": [231, 532, 897, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 568, 926, 612], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 622, 899, 650], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 987, 688, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [253, 91, 928, 433], \"label\": \"Table\"}, {\"bbox_2d\": [109, 458, 769, 473], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 492, 926, 520], \"label\": \"Text\"}, {\"bbox_2d\": [231, 530, 896, 558], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 568, 926, 611], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 622, 899, 649], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 676, 636, 692], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 710, 925, 785], \"label\": \"Text\"}, {\"bbox_2d\": [231, 802, 926, 861], \"label\": \"Text\"}, {\"bbox_2d\": [231, 878, 889, 921], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 685, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [41, 998, 169, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [41, 742, 781, 750], \"label\": \"Text\"}, {\"bbox_2d\": [41, 753, 417, 762], \"label\": \"Text\"}, {\"bbox_2d\": [111, 101, 918, 739], \"label\": \"Table\"}, {\"bbox_2d\": [37, 17, 354, 29], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [109, 142, 916, 299], \"label\": \"Table\"}, {\"bbox_2d\": [110, 343, 919, 538], \"label\": \"Table\"}, {\"bbox_2d\": [110, 581, 915, 737], \"label\": \"Table\"}, {\"bbox_2d\": [146, 754, 532, 762], \"label\": \"Text\"}, {\"bbox_2d\": [113, 742, 783, 750], \"label\": \"Text\"}, {\"bbox_2d\": [642, 555, 764, 579], \"label\": \"Section-header\"}, {\"bbox_2d\": [664, 317, 742, 341], \"label\": \"Section-header\"}, {\"bbox_2d\": [664, 117, 742, 141], \"label\": \"Section-header\"}, {\"bbox_2d\": [113, 101, 252, 112], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [893, 48, 912, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 95, 911, 411], \"label\": \"Text\"}, {\"bbox_2d\": [121, 424, 911, 548], \"label\": \"Text\"}, {\"bbox_2d\": [121, 561, 911, 658], \"label\": \"Text\"}, {\"bbox_2d\": [121, 747, 911, 871], \"label\": \"Text\"}, {\"bbox_2d\": [121, 884, 911, 953], \"label\": \"Text\"}, {\"bbox_2d\": [419, 702, 615, 716], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [891, 49, 911, 65], \"label\": \"Page-header\"}, {\"bbox_2d\": [418, 702, 614, 715], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 95, 911, 412], \"label\": \"Text\"}, {\"bbox_2d\": [121, 423, 911, 494], \"label\": \"Text\"}, {\"bbox_2d\": [121, 505, 910, 549], \"label\": \"Text\"}, {\"bbox_2d\": [121, 560, 911, 658], \"label\": \"Text\"}, {\"bbox_2d\": [121, 746, 911, 871], \"label\": \"Text\"}, {\"bbox_2d\": [121, 883, 911, 953], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [78, 21, 741, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 77, 391, 95], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 117, 501, 281], \"label\": \"Text\"}, {\"bbox_2d\": [78, 304, 440, 319], \"label\": \"Text\"}, {\"bbox_2d\": [534, 81, 954, 281], \"label\": \"Text\"}, {\"bbox_2d\": [534, 286, 954, 432], \"label\": \"Text\"}, {\"bbox_2d\": [534, 473, 954, 581], \"label\": \"Text\"}, {\"bbox_2d\": [534, 587, 957, 806], \"label\": \"Text\"}, {\"bbox_2d\": [534, 812, 958, 920], \"label\": \"Text\"}, {\"bbox_2d\": [78, 723, 482, 914], \"label\": \"Text\"}, {\"bbox_2d\": [78, 506, 496, 689], \"label\": \"Text\"}, {\"bbox_2d\": [92, 330, 485, 379], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 391, 485, 420], \"label\": \"List-item\"}, {\"bbox_2d\": [92, 433, 475, 483], \"label\": \"List-item\"}, {\"bbox_2d\": [534, 456, 779, 468], \"label\": \"Section-header\"}, {\"bbox_2d\": [479, 945, 556, 1005], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [301, 21, 753, 37], \"label\": \"Page-header\"}, {\"bbox_2d\": [475, 943, 556, 1009], \"label\": \"Picture\"}, {\"bbox_2d\": [78, 116, 507, 281], \"label\": \"Text\"}, {\"bbox_2d\": [78, 303, 444, 318], \"label\": \"Text\"}, {\"bbox_2d\": [78, 505, 501, 688], \"label\": \"Text\"}, {\"bbox_2d\": [78, 692, 486, 913], \"label\": \"Text\"}, {\"bbox_2d\": [534, 809, 963, 918], \"label\": \"Text\"}, {\"bbox_2d\": [534, 585, 964, 806], \"label\": \"Text\"}, {\"bbox_2d\": [534, 472, 959, 581], \"label\": \"Text\"}, {\"bbox_2d\": [534, 285, 959, 431], \"label\": \"Text\"}, {\"bbox_2d\": [534, 97, 960, 281], \"label\": \"Text\"}, {\"bbox_2d\": [534, 78, 955, 94], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 73, 396, 95], \"label\": \"Section-header\"}, {\"bbox_2d\": [534, 453, 780, 469], \"label\": \"Section-header\"}, {\"bbox_2d\": [94, 327, 490, 380], \"label\": \"List-item\"}, {\"bbox_2d\": [94, 387, 490, 422], \"label\": \"List-item\"}, {\"bbox_2d\": [94, 430, 480, 482], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [945, 37, 953, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [528, 73, 953, 322], \"label\": \"Text\"}, {\"bbox_2d\": [528, 338, 953, 716], \"label\": \"Text\"}, {\"bbox_2d\": [83, 444, 506, 718], \"label\": \"Text\"}, {\"bbox_2d\": [83, 732, 506, 979], \"label\": \"Text\"}, {\"bbox_2d\": [528, 732, 953, 979], \"label\": \"Text\"}, {\"bbox_2d\": [83, 410, 331, 419], \"label\": \"Section-header\"}, {\"bbox_2d\": [83, 361, 367, 371], \"label\": \"Caption\"}, {\"bbox_2d\": [98, 74, 496, 340], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [947, 39, 953, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [82, 444, 507, 717], \"label\": \"Text\"}, {\"bbox_2d\": [82, 733, 507, 979], \"label\": \"Text\"}, {\"bbox_2d\": [528, 76, 953, 322], \"label\": \"Text\"}, {\"bbox_2d\": [528, 339, 953, 716], \"label\": \"Text\"}, {\"bbox_2d\": [528, 733, 953, 979], \"label\": \"Text\"}, {\"bbox_2d\": [82, 409, 331, 420], \"label\": \"Section-header\"}, {\"bbox_2d\": [82, 362, 367, 371], \"label\": \"Caption\"}, {\"bbox_2d\": [102, 78, 492, 338], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [60, 44, 553, 62], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 78, 486, 249], \"label\": \"Text\"}, {\"bbox_2d\": [60, 266, 465, 343], \"label\": \"Text\"}, {\"bbox_2d\": [60, 360, 965, 500], \"label\": \"Text\"}, {\"bbox_2d\": [60, 623, 302, 703], \"label\": \"Text\"}, {\"bbox_2d\": [60, 561, 307, 608], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 518, 277, 547], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 72, 440, 91], \"label\": \"Section-header\"}, {\"bbox_2d\": [326, 374, 954, 897], \"label\": \"Picture\"}, {\"bbox_2d\": [33, 400, 43, 633], \"label\": \"Page-header\"}, {\"bbox_2d\": [508, 997, 527, 1010], \"label\": \"Page-footer\"}, {\"bbox_2d\": [486, 77, 970, 308], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [60, 41, 549, 64], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 76, 491, 251], \"label\": \"Text\"}, {\"bbox_2d\": [60, 264, 465, 313], \"label\": \"Text\"}, {\"bbox_2d\": [60, 311, 959, 345], \"label\": \"Text\"}, {\"bbox_2d\": [60, 358, 972, 376], \"label\": \"Text\"}, {\"bbox_2d\": [60, 374, 312, 502], \"label\": \"Text\"}, {\"bbox_2d\": [60, 516, 281, 545], \"label\": \"Caption\"}, {\"bbox_2d\": [60, 559, 313, 609], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 622, 306, 703], \"label\": \"Text\"}, {\"bbox_2d\": [26, 400, 44, 635], \"label\": \"Page-header\"}, {\"bbox_2d\": [492, 80, 969, 306], \"label\": \"Picture\"}, {\"bbox_2d\": [332, 377, 950, 896], \"label\": \"Picture\"}, {\"bbox_2d\": [508, 994, 527, 1015], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [790, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [504, 92, 523, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 143, 501, 177], \"label\": \"Text\"}, {\"bbox_2d\": [130, 180, 500, 230], \"label\": \"Text\"}, {\"bbox_2d\": [130, 234, 501, 285], \"label\": \"Text\"}, {\"bbox_2d\": [130, 288, 500, 323], \"label\": \"Text\"}, {\"bbox_2d\": [130, 326, 500, 391], \"label\": \"Text\"}, {\"bbox_2d\": [526, 143, 892, 165], \"label\": \"Text\"}, {\"bbox_2d\": [526, 169, 894, 195], \"label\": \"Text\"}, {\"bbox_2d\": [526, 199, 894, 222], \"label\": \"Text\"}, {\"bbox_2d\": [526, 226, 894, 287], \"label\": \"Text\"}, {\"bbox_2d\": [526, 497, 740, 508], \"label\": \"Text\"}, {\"bbox_2d\": [526, 512, 785, 523], \"label\": \"Text\"}, {\"bbox_2d\": [526, 527, 892, 553], \"label\": \"Text\"}, {\"bbox_2d\": [526, 558, 895, 594], \"label\": \"Text\"}, {\"bbox_2d\": [526, 599, 895, 650], \"label\": \"Text\"}, {\"bbox_2d\": [526, 654, 684, 665], \"label\": \"Text\"}, {\"bbox_2d\": [526, 670, 895, 735], \"label\": \"Text\"}, {\"bbox_2d\": [526, 740, 895, 783], \"label\": \"Text\"}, {\"bbox_2d\": [526, 787, 895, 838], \"label\": \"Text\"}, {\"bbox_2d\": [526, 842, 895, 871], \"label\": \"Text\"}, {\"bbox_2d\": [526, 875, 895, 900], \"label\": \"Text\"}, {\"bbox_2d\": [526, 904, 895, 954], \"label\": \"Text\"}, {\"bbox_2d\": [130, 723, 500, 756], \"label\": \"Text\"}, {\"bbox_2d\": [130, 760, 500, 798], \"label\": \"Text\"}, {\"bbox_2d\": [130, 802, 500, 916], \"label\": \"Text\"}, {\"bbox_2d\": [130, 920, 500, 983], \"label\": \"Text\"}, {\"bbox_2d\": [160, 426, 271, 500], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 505, 268, 584], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 589, 267, 700], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 323, 895, 406], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 421, 895, 472], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 875, 895, 925], \"label\": \"Text\"}, {\"bbox_2d\": [526, 930, 895, 983], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [506, 93, 521, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [790, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [131, 142, 499, 178], \"label\": \"Text\"}, {\"bbox_2d\": [130, 181, 501, 231], \"label\": \"Text\"}, {\"bbox_2d\": [130, 234, 500, 284], \"label\": \"Text\"}, {\"bbox_2d\": [131, 287, 501, 323], \"label\": \"Text\"}, {\"bbox_2d\": [130, 327, 500, 389], \"label\": \"Text\"}, {\"bbox_2d\": [526, 142, 895, 165], \"label\": \"Text\"}, {\"bbox_2d\": [526, 170, 895, 193], \"label\": \"Text\"}, {\"bbox_2d\": [526, 198, 890, 221], \"label\": \"Text\"}, {\"bbox_2d\": [526, 226, 896, 289], \"label\": \"Text\"}, {\"bbox_2d\": [527, 497, 740, 507], \"label\": \"Text\"}, {\"bbox_2d\": [526, 528, 890, 551], \"label\": \"Text\"}, {\"bbox_2d\": [527, 512, 783, 522], \"label\": \"Text\"}, {\"bbox_2d\": [526, 557, 895, 593], \"label\": \"Text\"}, {\"bbox_2d\": [526, 599, 898, 635], \"label\": \"Text\"}, {\"bbox_2d\": [527, 640, 684, 651], \"label\": \"Text\"}, {\"bbox_2d\": [526, 656, 894, 732], \"label\": \"Text\"}, {\"bbox_2d\": [526, 737, 895, 760], \"label\": \"Text\"}, {\"bbox_2d\": [130, 720, 503, 757], \"label\": \"Text\"}, {\"bbox_2d\": [130, 760, 498, 796], \"label\": \"Text\"}, {\"bbox_2d\": [130, 801, 501, 915], \"label\": \"Text\"}, {\"bbox_2d\": [130, 920, 500, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 765, 896, 828], \"label\": \"Text\"}, {\"bbox_2d\": [526, 834, 896, 870], \"label\": \"Text\"}, {\"bbox_2d\": [526, 875, 895, 899], \"label\": \"Text\"}, {\"bbox_2d\": [526, 904, 894, 927], \"label\": \"Text\"}, {\"bbox_2d\": [526, 932, 895, 982], \"label\": \"Text\"}, {\"bbox_2d\": [161, 417, 503, 499], \"label\": \"Picture\"}, {\"bbox_2d\": [160, 503, 499, 585], \"label\": \"Picture\"}, {\"bbox_2d\": [157, 586, 501, 698], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 317, 897, 405], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 412, 897, 472], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [135, 139, 900, 634], \"label\": \"Text\"}, {\"bbox_2d\": [357, 890, 671, 993], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [137, 140, 897, 636], \"label\": \"Text\"}, {\"bbox_2d\": [338, 890, 696, 994], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [31, 271, 61, 759], \"label\": \"Page-header\"}, {\"bbox_2d\": [86, 742, 483, 839], \"label\": \"Text\"}, {\"bbox_2d\": [86, 922, 483, 953], \"label\": \"Text\"}, {\"bbox_2d\": [86, 974, 216, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [510, 712, 906, 953], \"label\": \"Text\"}, {\"bbox_2d\": [510, 598, 906, 701], \"label\": \"Text\"}, {\"bbox_2d\": [86, 633, 483, 726], \"label\": \"Text\"}, {\"bbox_2d\": [86, 438, 906, 570], \"label\": \"Text\"}, {\"bbox_2d\": [228, 412, 767, 422], \"label\": \"Text\"}, {\"bbox_2d\": [164, 321, 829, 355], \"label\": \"Text\"}, {\"bbox_2d\": [249, 352, 743, 377], \"label\": \"Text\"}, {\"bbox_2d\": [453, 394, 542, 408], \"label\": \"Text\"}, {\"bbox_2d\": [143, 366, 705, 377], \"label\": \"Text\"}, {\"bbox_2d\": [288, 276, 705, 287], \"label\": \"Text\"}, {\"bbox_2d\": [458, 217, 619, 242], \"label\": \"Text\"}, {\"bbox_2d\": [262, 232, 705, 242], \"label\": \"Text\"}, {\"bbox_2d\": [426, 173, 566, 184], \"label\": \"Text\"}, {\"bbox_2d\": [370, 128, 624, 138], \"label\": \"Text\"}, {\"bbox_2d\": [196, 141, 797, 152], \"label\": \"Text\"}, {\"bbox_2d\": [316, 155, 848, 166], \"label\": \"Text\"}, {\"bbox_2d\": [143, 187, 848, 197], \"label\": \"Text\"}, {\"bbox_2d\": [109, 74, 882, 107], \"label\": \"Title\"}, {\"bbox_2d\": [311, 173, 566, 184], \"label\": \"Section-header\"}, {\"bbox_2d\": [417, 262, 574, 272], \"label\": \"Section-header\"}, {\"bbox_2d\": [463, 306, 528, 317], \"label\": \"Section-header\"}, {\"bbox_2d\": [196, 598, 373, 609], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 875, 475, 901], \"label\": \"Section-header\"}, {\"bbox_2d\": [322, 39, 715, 50], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [321, 39, 715, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [939, 39, 948, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [108, 75, 883, 107], \"label\": \"Title\"}, {\"bbox_2d\": [367, 126, 624, 138], \"label\": \"Text\"}, {\"bbox_2d\": [195, 141, 796, 151], \"label\": \"Text\"}, {\"bbox_2d\": [425, 171, 566, 183], \"label\": \"Text\"}, {\"bbox_2d\": [144, 186, 847, 196], \"label\": \"Text\"}, {\"bbox_2d\": [458, 216, 533, 228], \"label\": \"Text\"}, {\"bbox_2d\": [260, 231, 731, 241], \"label\": \"Text\"}, {\"bbox_2d\": [417, 261, 574, 273], \"label\": \"Text\"}, {\"bbox_2d\": [286, 276, 705, 286], \"label\": \"Text\"}, {\"bbox_2d\": [462, 306, 529, 318], \"label\": \"Text\"}, {\"bbox_2d\": [162, 321, 829, 331], \"label\": \"Text\"}, {\"bbox_2d\": [249, 351, 742, 363], \"label\": \"Text\"}, {\"bbox_2d\": [288, 366, 703, 376], \"label\": \"Text\"}, {\"bbox_2d\": [448, 396, 543, 408], \"label\": \"Text\"}, {\"bbox_2d\": [225, 411, 766, 421], \"label\": \"Text\"}, {\"bbox_2d\": [158, 437, 833, 570], \"label\": \"Text\"}, {\"bbox_2d\": [88, 632, 480, 838], \"label\": \"Text\"}, {\"bbox_2d\": [511, 595, 903, 699], \"label\": \"Text\"}, {\"bbox_2d\": [88, 922, 480, 949], \"label\": \"Text\"}, {\"bbox_2d\": [511, 712, 903, 949], \"label\": \"Text\"}, {\"bbox_2d\": [88, 974, 216, 986], \"label\": \"Page-footer\"}, {\"bbox_2d\": [94, 875, 474, 899], \"label\": \"Section-header\"}, {\"bbox_2d\": [196, 598, 372, 609], \"label\": \"Section-header\"}, {\"bbox_2d\": [31, 270, 61, 758], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [324, 102, 706, 114], \"label\": \"Page-header\"}, {\"bbox_2d\": [442, 161, 601, 177], \"label\": \"Title\"}, {\"bbox_2d\": [356, 203, 685, 219], \"label\": \"Title\"}, {\"bbox_2d\": [179, 246, 863, 285], \"label\": \"Title\"}, {\"bbox_2d\": [290, 325, 754, 385], \"label\": \"Table\"}, {\"bbox_2d\": [156, 411, 884, 468], \"label\": \"Text\"}, {\"bbox_2d\": [156, 515, 884, 542], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 589, 884, 616], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 624, 673, 635], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 707, 884, 748], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 768, 503, 779], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 659, 767, 671], \"label\": \"Section-header\"}, {\"bbox_2d\": [172, 683, 864, 694], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 708, 884, 748], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 828, 884, 854], \"label\": \"List-item\"}, {\"bbox_2d\": [206, 802, 653, 813], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 875, 884, 884], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 888, 884, 917], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 921, 753, 930], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 566, 306, 577], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 491, 385, 502], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [324, 102, 706, 114], \"label\": \"Page-header\"}, {\"bbox_2d\": [442, 160, 601, 178], \"label\": \"Title\"}, {\"bbox_2d\": [356, 202, 687, 220], \"label\": \"Title\"}, {\"bbox_2d\": [180, 244, 863, 285], \"label\": \"Text\"}, {\"bbox_2d\": [288, 323, 756, 387], \"label\": \"Table\"}, {\"bbox_2d\": [156, 411, 884, 468], \"label\": \"Text\"}, {\"bbox_2d\": [156, 515, 884, 542], \"label\": \"Text\"}, {\"bbox_2d\": [156, 491, 385, 502], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 565, 306, 576], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 589, 884, 616], \"label\": \"Text\"}, {\"bbox_2d\": [171, 624, 674, 635], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 658, 768, 670], \"label\": \"Section-header\"}, {\"bbox_2d\": [171, 683, 865, 695], \"label\": \"Text\"}, {\"bbox_2d\": [171, 708, 681, 719], \"label\": \"Text\"}, {\"bbox_2d\": [235, 727, 880, 754], \"label\": \"Text\"}, {\"bbox_2d\": [171, 767, 503, 779], \"label\": \"Text\"}, {\"bbox_2d\": [205, 826, 884, 853], \"label\": \"Text\"}, {\"bbox_2d\": [205, 802, 653, 813], \"label\": \"Section-header\"}, {\"bbox_2d\": [156, 877, 873, 886], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 888, 883, 919], \"label\": \"Footnote\"}, {\"bbox_2d\": [156, 921, 749, 930], \"label\": \"Footnote\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [1, 0, 1035, 1035], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [-1, -1, 1034, 1034], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [368, 81, 668, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [116, 140, 920, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 211, 272, 222], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 572, 437, 583], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 594, 507, 604], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 615, 507, 664], \"label\": \"Text\"}, {\"bbox_2d\": [116, 673, 507, 780], \"label\": \"Text\"}, {\"bbox_2d\": [116, 808, 471, 810], \"label\": \"Section-header\"}, {\"bbox_2d\": [116, 817, 507, 965], \"label\": \"Text\"}, {\"bbox_2d\": [529, 875, 920, 965], \"label\": \"Text\"}, {\"bbox_2d\": [529, 857, 631, 868], \"label\": \"Section-header\"}, {\"bbox_2d\": [529, 808, 920, 839], \"label\": \"Text\"}, {\"bbox_2d\": [529, 648, 920, 791], \"label\": \"Table\"}, {\"bbox_2d\": [529, 446, 921, 636], \"label\": \"Table\"}, {\"bbox_2d\": [529, 211, 920, 378], \"label\": \"Text\"}, {\"bbox_2d\": [116, 234, 507, 554], \"label\": \"Text\"}, {\"bbox_2d\": [529, 384, 920, 434], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [367, 79, 668, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [117, 135, 918, 157], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 209, 271, 221], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 572, 438, 584], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 593, 507, 603], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 798, 471, 808], \"label\": \"Section-header\"}, {\"bbox_2d\": [527, 857, 631, 867], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 229, 511, 552], \"label\": \"Text\"}, {\"bbox_2d\": [115, 612, 511, 662], \"label\": \"Text\"}, {\"bbox_2d\": [115, 670, 511, 779], \"label\": \"Text\"}, {\"bbox_2d\": [115, 816, 511, 964], \"label\": \"Text\"}, {\"bbox_2d\": [527, 210, 923, 377], \"label\": \"Text\"}, {\"bbox_2d\": [527, 385, 923, 435], \"label\": \"Text\"}, {\"bbox_2d\": [527, 807, 923, 838], \"label\": \"Text\"}, {\"bbox_2d\": [527, 875, 923, 964], \"label\": \"Text\"}, {\"bbox_2d\": [523, 445, 919, 636], \"label\": \"Table\"}, {\"bbox_2d\": [523, 642, 924, 795], \"label\": \"Table\"}, {\"bbox_2d\": [512, 999, 523, 1008], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [90, 48, 275, 78], \"label\": \"Page-header\"}, {\"bbox_2d\": [154, 119, 877, 927], \"label\": \"Picture\"}, {\"bbox_2d\": [331, 944, 710, 960], \"label\": \"Caption\"}, {\"bbox_2d\": [510, 983, 531, 998], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [89, 45, 287, 77], \"label\": \"Page-header\"}, {\"bbox_2d\": [147, 118, 882, 928], \"label\": \"Picture\"}, {\"bbox_2d\": [331, 941, 710, 956], \"label\": \"Caption\"}, {\"bbox_2d\": [512, 979, 536, 994], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 74, 918, 129], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 144, 875, 164], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 165, 468, 186], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 187, 771, 207], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 208, 403, 229], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 229, 594, 250], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 250, 798, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 271, 726, 291], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 291, 630, 312], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 312, 527, 333], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 333, 539, 354], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 354, 506, 375], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 375, 502, 396], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 408, 757, 443], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 443, 394, 464], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 464, 709, 485], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 485, 567, 506], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 506, 596, 527], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 527, 476, 548], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 561, 884, 596], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 596, 884, 632], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 632, 523, 654], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 654, 735, 675], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 675, 394, 696], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 696, 559, 717], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 776, 569, 796], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 796, 427, 817], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 817, 534, 838], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 838, 511, 859], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 859, 241, 880], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 880, 937, 901], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 901, 478, 922], \"label\": \"List-item\"}, {\"bbox_2d\": [88, 922, 525, 943], \"label\": \"List-item\"}, {\"bbox_2d\": [481, 1003, 589, 1019], \"label\": \"Page-footer\"}, {\"bbox_2d\": [88, 950, 331, 973], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 74, 916, 108], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 110, 542, 129], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 165, 467, 185], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 186, 771, 205], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 207, 401, 226], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 228, 594, 247], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 249, 799, 268], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 270, 725, 289], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 291, 629, 310], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 312, 526, 331], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 333, 539, 352], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 353, 510, 373], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 374, 502, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 144, 868, 164], \"label\": \"Text\"}, {\"bbox_2d\": [87, 409, 757, 443], \"label\": \"Text\"}, {\"bbox_2d\": [91, 444, 393, 464], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 465, 709, 485], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 486, 567, 506], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 507, 595, 527], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 528, 477, 547], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 598, 884, 632], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 634, 521, 653], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 655, 735, 674], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 675, 393, 695], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 696, 559, 716], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 796, 426, 815], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 817, 533, 836], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 838, 510, 857], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 859, 241, 878], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 880, 937, 899], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 901, 476, 920], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 921, 524, 941], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 775, 569, 794], \"label\": \"Text\"}, {\"bbox_2d\": [87, 717, 970, 766], \"label\": \"Text\"}, {\"bbox_2d\": [87, 562, 885, 596], \"label\": \"Text\"}, {\"bbox_2d\": [87, 952, 331, 972], \"label\": \"Section-header\"}, {\"bbox_2d\": [481, 1003, 589, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 123, 504, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 292, 288, 307], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 367, 265, 382], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 470, 265, 485], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 555, 262, 570], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 656, 324, 672], \"label\": \"Section-header\"}, {\"bbox_2d\": [218, 790, 382, 806], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 166, 910, 194], \"label\": \"Text\"}, {\"bbox_2d\": [292, 320, 920, 348], \"label\": \"Text\"}, {\"bbox_2d\": [292, 392, 928, 450], \"label\": \"Text\"}, {\"bbox_2d\": [292, 494, 912, 536], \"label\": \"Text\"}, {\"bbox_2d\": [292, 579, 926, 637], \"label\": \"Text\"}, {\"bbox_2d\": [292, 681, 932, 772], \"label\": \"Text\"}, {\"bbox_2d\": [292, 816, 930, 905], \"label\": \"Text\"}, {\"bbox_2d\": [292, 200, 916, 228], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 237, 537, 249], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 258, 613, 268], \"label\": \"List-item\"}, {\"bbox_2d\": [905, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 504, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 291, 288, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 367, 264, 384], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 469, 266, 486], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 555, 262, 572], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 657, 324, 674], \"label\": \"Section-header\"}, {\"bbox_2d\": [217, 790, 382, 807], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 813, 930, 905], \"label\": \"Text\"}, {\"bbox_2d\": [293, 680, 931, 772], \"label\": \"Text\"}, {\"bbox_2d\": [293, 578, 926, 638], \"label\": \"Text\"}, {\"bbox_2d\": [293, 492, 912, 537], \"label\": \"Text\"}, {\"bbox_2d\": [293, 388, 927, 451], \"label\": \"Text\"}, {\"bbox_2d\": [293, 320, 919, 349], \"label\": \"Text\"}, {\"bbox_2d\": [293, 166, 909, 195], \"label\": \"Text\"}, {\"bbox_2d\": [293, 199, 915, 230], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 234, 537, 250], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 254, 612, 269], \"label\": \"List-item\"}, {\"bbox_2d\": [904, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 97, 286, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 124, 483, 152], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 231, 471, 258], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 490, 165, 503], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 578, 193, 591], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 692, 192, 705], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 424, 826, 437], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 280, 631, 292], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 155, 469, 183], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 188, 361, 201], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 206, 497, 214], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 262, 511, 330], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 334, 511, 417], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 421, 511, 448], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 448, 511, 475], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 368, 946, 395], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 398, 868, 408], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 297, 946, 364], \"label\": \"Text\"}, {\"bbox_2d\": [91, 506, 511, 561], \"label\": \"Text\"}, {\"bbox_2d\": [91, 591, 511, 676], \"label\": \"Text\"}, {\"bbox_2d\": [91, 707, 511, 778], \"label\": \"Text\"}, {\"bbox_2d\": [91, 787, 511, 855], \"label\": \"Text\"}, {\"bbox_2d\": [91, 856, 392, 867], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 872, 352, 882], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 887, 349, 897], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 902, 300, 911], \"label\": \"List-item\"}, {\"bbox_2d\": [526, 673, 946, 694], \"label\": \"Text\"}, {\"bbox_2d\": [544, 718, 651, 735], \"label\": \"Text\"}, {\"bbox_2d\": [526, 449, 946, 662], \"label\": \"Table\"}, {\"bbox_2d\": [526, 709, 946, 846], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 96, 929, 262], \"label\": \"Picture\"}, {\"bbox_2d\": [533, 126, 684, 242], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 100, 816, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 910, 300, 911], \"label\": \"Text\"}, {\"bbox_2d\": [526, 748, 926, 831], \"label\": \"Table\"}, {\"bbox_2d\": [542, 828, 673, 835], \"label\": \"Caption\"}, {\"bbox_2d\": [542, 716, 854, 727], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 565, 749, 572], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 487, 623, 494], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 509, 646, 517], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 576, 749, 583], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 608, 730, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 636, 722, 643], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 658, 749, 665], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 680, 946, 690], \"label\": \"Caption\"}, {\"bbox_2d\": [526, 747, 926, 831], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 586, 749, 593], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 618, 730, 626], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 651, 749, 658], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 685, 946, 694], \"label\": \"Caption\"}, {\"bbox_2d\": [591, 999, 946, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [91, 593, 511, 678], \"label\": \"Text\"}, {\"bbox_2d\": [91, 707, 511, 778], \"label\": \"Text\"}, {\"bbox_2d\": [91, 790, 511, 855], \"label\": \"Text\"}, {\"bbox_2d\": [91, 912, 300, 911], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 352, 892], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 352, 892], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 352, 892], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 352, 892], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 352, 892], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 352, 892], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 352, 892], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 352, 892], \"label\": \"Text\"}, {\"bbox_2d\": [91, 882, 352, 892], \"label", "label": "```json[{\"bbox_2d\": [1002, 94, 1013, 309], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 98, 286, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 124, 477, 151], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 155, 469, 182], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 187, 360, 198], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 203, 497, 214], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 262, 507, 330], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 334, 507, 416], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 421, 507, 445], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 449, 507, 474], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 231, 469, 258], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 492, 165, 503], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 578, 192, 589], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 693, 192, 704], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 507, 507, 560], \"label\": \"Text\"}, {\"bbox_2d\": [91, 593, 507, 675], \"label\": \"Text\"}, {\"bbox_2d\": [91, 708, 507, 776], \"label\": \"Text\"}, {\"bbox_2d\": [91, 786, 507, 853], \"label\": \"Text\"}, {\"bbox_2d\": [91, 858, 393, 868], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 872, 352, 882], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 886, 344, 896], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 901, 300, 911], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 281, 630, 292], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 296, 944, 364], \"label\": \"Text\"}, {\"bbox_2d\": [528, 368, 944, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 397, 868, 407], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 425, 826, 436], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 672, 944, 692], \"label\": \"Text\"}, {\"bbox_2d\": [591, 1000, 945, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [523, 85, 930, 263], \"label\": \"Picture\"}, {\"bbox_2d\": [525, 702, 946, 849], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 449, 948, 665], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [422, 28, 612, 39], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 44, 759, 55], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 17, 173, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 85, 1009, 107], \"label\": \"Text\"}, {\"bbox_2d\": [25, 114, 1009, 177], \"label\": \"Text\"}, {\"bbox_2d\": [25, 184, 1009, 228], \"label\": \"Text\"}, {\"bbox_2d\": [25, 255, 1009, 302], \"label\": \"Text\"}, {\"bbox_2d\": [25, 330, 1009, 434], \"label\": \"Text\"}, {\"bbox_2d\": [25, 438, 1009, 460], \"label\": \"Text\"}, {\"bbox_2d\": [25, 483, 1009, 517], \"label\": \"Text\"}, {\"bbox_2d\": [25, 524, 1009, 568], \"label\": \"Text\"}, {\"bbox_2d\": [25, 589, 1009, 660], \"label\": \"Text\"}, {\"bbox_2d\": [55, 573, 493, 584], \"label\": \"Section-header\"}, {\"bbox_2d\": [57, 467, 421, 478], \"label\": \"Section-header\"}, {\"bbox_2d\": [39, 314, 666, 325], \"label\": \"Section-header\"}, {\"bbox_2d\": [37, 240, 140, 251], \"label\": \"Section-header\"}, {\"bbox_2d\": [509, 697, 526, 707], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [58, 17, 173, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [422, 28, 612, 39], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 45, 760, 56], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 85, 1009, 107], \"label\": \"Text\"}, {\"bbox_2d\": [25, 114, 1009, 177], \"label\": \"Text\"}, {\"bbox_2d\": [25, 183, 1009, 228], \"label\": \"Text\"}, {\"bbox_2d\": [25, 257, 1009, 302], \"label\": \"Text\"}, {\"bbox_2d\": [25, 331, 1009, 432], \"label\": \"Text\"}, {\"bbox_2d\": [25, 438, 1009, 460], \"label\": \"Text\"}, {\"bbox_2d\": [25, 484, 1009, 517], \"label\": \"Text\"}, {\"bbox_2d\": [25, 523, 1009, 568], \"label\": \"Text\"}, {\"bbox_2d\": [25, 591, 1009, 659], \"label\": \"Text\"}, {\"bbox_2d\": [55, 574, 494, 585], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 467, 421, 477], \"label\": \"Section-header\"}, {\"bbox_2d\": [40, 314, 667, 324], \"label\": \"Section-header\"}, {\"bbox_2d\": [37, 240, 139, 251], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 696, 525, 707], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [41, 0, 951, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [41, 42, 370, 68], \"label\": \"Text\"}, {\"bbox_2d\": [41, 72, 636, 87], \"label\": \"Text\"}, {\"bbox_2d\": [41, 148, 994, 240], \"label\": \"Text\"}, {\"bbox_2d\": [462, 273, 573, 286], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 320, 887, 331], \"label\": \"Section-header\"}, {\"bbox_2d\": [41, 364, 994, 461], \"label\": \"Text\"}, {\"bbox_2d\": [41, 492, 994, 613], \"label\": \"Text\"}, {\"bbox_2d\": [41, 645, 994, 714], \"label\": \"Text\"}, {\"bbox_2d\": [41, 746, 994, 815], \"label\": \"Text\"}, {\"bbox_2d\": [41, 846, 994, 887], \"label\": \"Text\"}, {\"bbox_2d\": [41, 939, 994, 961], \"label\": \"Text\"}, {\"bbox_2d\": [41, 990, 994, 1005], \"label\": \"Page-footer\"}, {\"bbox_2d\": [930, 1021, 951, 1036], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 1021, 55, 1036], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [530, 0, 1035, 13], \"label\": \"Page-header\"}, {\"bbox_2d\": [0, 0, 510, 13], \"label\": \"Page-header\"}, {\"bbox_2d\": [42, 23, 369, 41], \"label\": \"Text\"}, {\"bbox_2d\": [42, 69, 636, 86], \"label\": \"Text\"}, {\"bbox_2d\": [42, 115, 993, 241], \"label\": \"Text\"}, {\"bbox_2d\": [462, 269, 573, 286], \"label\": \"Section-header\"}, {\"bbox_2d\": [149, 315, 886, 332], \"label\": \"Section-header\"}, {\"bbox_2d\": [42, 361, 993, 459], \"label\": \"Text\"}, {\"bbox_2d\": [42, 488, 993, 613], \"label\": \"Text\"}, {\"bbox_2d\": [42, 642, 993, 713], \"label\": \"Text\"}, {\"bbox_2d\": [42, 742, 993, 813], \"label\": \"Text\"}, {\"bbox_2d\": [42, 842, 993, 886], \"label\": \"Text\"}, {\"bbox_2d\": [42, 915, 993, 959], \"label\": \"Text\"}, {\"bbox_2d\": [42, 987, 993, 1005], \"label\": \"Text\"}, {\"bbox_2d\": [929, 1022, 1036, 1036], \"label\": \"Page-footer\"}, {\"bbox_2d\": [0, 1022, 56, 1036], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [654, 35, 899, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 35, 320, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [419, 66, 617, 85], \"label\": \"Section-header\"}, {\"bbox_2d\": [134, 98, 897, 941], \"label\": \"Table\"}, {\"bbox_2d\": [536, 971, 547, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [654, 35, 903, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [138, 35, 322, 47], \"label\": \"Page-header\"}, {\"bbox_2d\": [419, 65, 622, 84], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 101, 902, 940], \"label\": \"Table\"}, {\"bbox_2d\": [538, 972, 550, 983], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [563, 148, 914, 160], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 148, 480, 175], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 181, 338, 193], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 290, 269, 302], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 462, 458, 474], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 775, 393, 787], \"label\": \"Section-header\"}, {\"bbox_2d\": [862, 999, 979, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [746, 733, 965, 850], \"label\": \"Picture\"}, {\"bbox_2d\": [746, 842, 852, 850], \"label\": \"Caption\"}, {\"bbox_2d\": [301, 715, 390, 723], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 794, 527, 928], \"label\": \"Text\"}, {\"bbox_2d\": [119, 197, 527, 286], \"label\": \"Text\"}, {\"bbox_2d\": [119, 306, 527, 460], \"label\": \"Text\"}, {\"bbox_2d\": [119, 479, 527, 646], \"label\": \"Text\"}, {\"bbox_2d\": [563, 165, 967, 709], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [120, 149, 479, 176], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 176, 333, 199], \"label\": \"Section-header\"}, {\"bbox_2d\": [563, 144, 910, 167], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 286, 265, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 458, 455, 481], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 771, 388, 794], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 197, 525, 286], \"label\": \"Text\"}, {\"bbox_2d\": [120, 307, 524, 458], \"label\": \"Text\"}, {\"bbox_2d\": [120, 479, 525, 599], \"label\": \"Text\"}, {\"bbox_2d\": [120, 605, 296, 646], \"label\": \"Text\"}, {\"bbox_2d\": [563, 165, 970, 708], \"label\": \"Text\"}, {\"bbox_2d\": [301, 716, 385, 723], \"label\": \"Text\"}, {\"bbox_2d\": [747, 844, 851, 851], \"label\": \"Text\"}, {\"bbox_2d\": [120, 793, 525, 928], \"label\": \"Text\"}, {\"bbox_2d\": [863, 1001, 977, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [300, 605, 519, 713], \"label\": \"Picture\"}, {\"bbox_2d\": [745, 734, 967, 841], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [122, 97, 595, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 132, 390, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [146, 208, 621, 235], \"label\": \"List-item\"}, {\"bbox_2d\": [146, 558, 480, 571], \"label\": \"List-item\"}, {\"bbox_2d\": [176, 574, 902, 619], \"label\": \"Text\"}, {\"bbox_2d\": [176, 622, 902, 650], \"label\": \"Text\"}, {\"bbox_2d\": [497, 955, 537, 968], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 146, 915, 542], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 599, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 127, 911, 541], \"label\": \"Table\"}, {\"bbox_2d\": [498, 953, 541, 971], \"label\": \"Page-footer\"}, {\"bbox_2d\": [146, 555, 484, 571], \"label\": \"Section-header\"}, {\"bbox_2d\": [176, 571, 901, 620], \"label\": \"Text\"}, {\"bbox_2d\": [176, 619, 901, 652], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [150, 92, 911, 183], \"label\": \"Text\"}, {\"bbox_2d\": [150, 194, 911, 361], \"label\": \"Text\"}, {\"bbox_2d\": [150, 373, 911, 437], \"label\": \"Text\"}, {\"bbox_2d\": [150, 449, 911, 590], \"label\": \"Text\"}, {\"bbox_2d\": [150, 602, 910, 691], \"label\": \"Text\"}, {\"bbox_2d\": [150, 703, 911, 742], \"label\": \"Text\"}, {\"bbox_2d\": [150, 754, 911, 945], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [150, 92, 911, 182], \"label\": \"Text\"}, {\"bbox_2d\": [150, 194, 911, 360], \"label\": \"Text\"}, {\"bbox_2d\": [150, 372, 912, 589], \"label\": \"Text\"}, {\"bbox_2d\": [150, 602, 911, 742], \"label\": \"Text\"}, {\"bbox_2d\": [150, 754, 937, 946], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [72, 42, 306, 73], \"label\": \"Section-header\"}, {\"bbox_2d\": [929, 14, 1016, 83], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 93, 910, 306], \"label\": \"Text\"}, {\"bbox_2d\": [72, 354, 800, 383], \"label\": \"Section-header\"}, {\"bbox_2d\": [67, 812, 162, 830], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 844, 325, 975], \"label\": \"Table\"}, {\"bbox_2d\": [370, 844, 933, 975], \"label\": \"Table\"}, {\"bbox_2d\": [67, 404, 947, 735], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [72, 46, 287, 72], \"label\": \"Section-header\"}, {\"bbox_2d\": [926, 18, 1018, 77], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 93, 910, 305], \"label\": \"Text\"}, {\"bbox_2d\": [72, 355, 800, 381], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 414, 205, 437], \"label\": \"Section-header\"}, {\"bbox_2d\": [308, 414, 374, 437], \"label\": \"Section-header\"}, {\"bbox_2d\": [529, 414, 615, 437], \"label\": \"Section-header\"}, {\"bbox_2d\": [750, 414, 885, 437], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 448, 257, 476], \"label\": \"Text\"}, {\"bbox_2d\": [87, 486, 257, 543], \"label\": \"Text\"}, {\"bbox_2d\": [308, 448, 456, 476], \"label\": \"Text\"}, {\"bbox_2d\": [308, 486, 476, 533], \"label\": \"Text\"}, {\"bbox_2d\": [529, 448, 699, 486], \"label\": \"Text\"}, {\"bbox_2d\": [529, 496, 713, 543], \"label\": \"Text\"}, {\"bbox_2d\": [750, 448, 904, 476], \"label\": \"Text\"}, {\"bbox_2d\": [750, 486, 917, 533], \"label\": \"Text\"}, {\"bbox_2d\": [88, 619, 270, 666], \"label\": \"Text\"}, {\"bbox_2d\": [88, 676, 259, 713], \"label\": \"Text\"}, {\"bbox_2d\": [309, 619, 476, 656], \"label\": \"Text\"}, {\"bbox_2d\": [309, 666, 480, 713], \"label\": \"Text\"}, {\"bbox_2d\": [529, 619, 691, 656], \"label\": \"Text\"}, {\"bbox_2d\": [529, 666, 705, 723], \"label\": \"Text\"}, {\"bbox_2d\": [750, 619, 915, 656], \"label\": \"Text\"}, {\"bbox_2d\": [750, 666, 921, 723], \"label\": \"Text\"}, {\"bbox_2d\": [88, 585, 193, 607], \"label\": \"Section-header\"}, {\"bbox_2d\": [309, 585, 466, 607], \"label\": \"Section-header\"}, {\"bbox_2d\": [529, 585, 686, 607], \"label\": \"Section-header\"}, {\"bbox_2d\": [750, 585, 935, 607], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 815, 161, 827], \"label\": \"Section-header\"}, {\"bbox_2d\": [86, 866, 325, 876], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 847, 239, 857], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 886, 224, 895], \"label\": \"List-item\"}, {\"bbox_2d\": [86, 905, 258, 914], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 924, 283, 934], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 944, 235, 953], \"label\": \"List-item\"}, {\"bbox_2d\": [79, 963, 331, 972], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 847, 627, 857], \"label\": \"List-item\"}, {\"bbox_2d\": [375, 866, 596, 887], \"label\": \"List-item\"}, {\"bbox_2d\": [375, 898, 600, 919], \"label\": \"List-item\"}, {\"bbox_2d\": [376, 929, 608, 938], \"label\": \"List-item\"}, {\"bbox_2d\": [375, 948, 565, 969], \"label\": \"List-item\"}, {\"bbox_2d\": [673, 847, 862, 868], \"label\": \"List-item\"}, {\"bbox_2d\": [673, 878, 885, 899], \"label\": \"List-item\"}, {\"bbox_2d\": [673, 909, 926, 919], \"label\": \"List-item\"}, {\"bbox_2d\": [672, 929, 881, 950], \"label\": \"List-item\"}, {\"bbox_2d\": [666, 960, 931, 969], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [404, 87, 545, 100], \"label\": \"Text\"}, {\"bbox_2d\": [404, 114, 909, 154], \"label\": \"Text\"}, {\"bbox_2d\": [404, 157, 559, 169], \"label\": \"Text\"}, {\"bbox_2d\": [404, 184, 913, 268], \"label\": \"Text\"}, {\"bbox_2d\": [404, 272, 555, 284], \"label\": \"Text\"}, {\"bbox_2d\": [404, 298, 914, 338], \"label\": \"Text\"}, {\"bbox_2d\": [339, 379, 659, 391], \"label\": \"Text\"}, {\"bbox_2d\": [273, 360, 874, 372], \"label\": \"Text\"}, {\"bbox_2d\": [273, 379, 659, 391], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 385, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [340, 87, 360, 99], \"label\": \"List-item\"}, {\"bbox_2d\": [340, 157, 359, 169], \"label\": \"List-item\"}, {\"bbox_2d\": [340, 272, 359, 284], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [339, 89, 539, 101], \"label\": \"List-item\"}, {\"bbox_2d\": [339, 159, 556, 171], \"label\": \"List-item\"}, {\"bbox_2d\": [339, 273, 551, 285], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 360, 868, 393], \"label\": \"List-item\"}, {\"bbox_2d\": [406, 185, 909, 269], \"label\": \"Text\"}, {\"bbox_2d\": [406, 299, 911, 339], \"label\": \"Text\"}, {\"bbox_2d\": [406, 115, 904, 155], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [156, 90, 533, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [172, 115, 767, 127], \"label\": \"Text\"}, {\"bbox_2d\": [156, 134, 878, 161], \"label\": \"Text\"}, {\"bbox_2d\": [156, 169, 878, 211], \"label\": \"Text\"}, {\"bbox_2d\": [156, 219, 878, 261], \"label\": \"Text\"}, {\"bbox_2d\": [156, 269, 878, 307], \"label\": \"Text\"}, {\"bbox_2d\": [156, 318, 878, 359], \"label\": \"Text\"}, {\"bbox_2d\": [156, 367, 878, 409], \"label\": \"Text\"}, {\"bbox_2d\": [156, 417, 878, 444], \"label\": \"Text\"}, {\"bbox_2d\": [172, 452, 878, 464], \"label\": \"Text\"}, {\"bbox_2d\": [187, 472, 662, 484], \"label\": \"Text\"}, {\"bbox_2d\": [187, 491, 878, 518], \"label\": \"Text\"}, {\"bbox_2d\": [226, 561, 679, 573], \"label\": \"Text\"}, {\"bbox_2d\": [172, 581, 878, 622], \"label\": \"Text\"}, {\"bbox_2d\": [156, 630, 878, 671], \"label\": \"Text\"}, {\"bbox_2d\": [156, 679, 878, 721], \"label\": \"Text\"}, {\"bbox_2d\": [172, 729, 878, 771], \"label\": \"Text\"}, {\"bbox_2d\": [172, 779, 349, 791], \"label\": \"Text\"}, {\"bbox_2d\": [187, 799, 625, 811], \"label\": \"Text\"}, {\"bbox_2d\": [187, 819, 625, 831], \"label\": \"Text\"}, {\"bbox_2d\": [187, 839, 763, 851], \"label\": \"Text\"}, {\"bbox_2d\": [187, 859, 650, 871], \"label\": \"Text\"}, {\"bbox_2d\": [509, 979, 526, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [226, 526, 878, 553], \"label\": \"List-item\"}, {\"bbox_2d\": [221, 499, 878, 518], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [156, 91, 538, 103], \"label\": \"Section-header\"}, {\"bbox_2d\": [171, 116, 772, 127], \"label\": \"Text\"}, {\"bbox_2d\": [156, 135, 884, 162], \"label\": \"Text\"}, {\"bbox_2d\": [156, 170, 884, 212], \"label\": \"Text\"}, {\"bbox_2d\": [156, 220, 884, 261], \"label\": \"Text\"}, {\"bbox_2d\": [156, 269, 884, 310], \"label\": \"Text\"}, {\"bbox_2d\": [156, 318, 884, 360], \"label\": \"Text\"}, {\"bbox_2d\": [156, 368, 884, 409], \"label\": \"Text\"}, {\"bbox_2d\": [156, 417, 884, 444], \"label\": \"Text\"}, {\"bbox_2d\": [171, 452, 884, 464], \"label\": \"Text\"}, {\"bbox_2d\": [156, 581, 884, 622], \"label\": \"Text\"}, {\"bbox_2d\": [156, 630, 884, 672], \"label\": \"Text\"}, {\"bbox_2d\": [156, 680, 884, 721], \"label\": \"Text\"}, {\"bbox_2d\": [156, 729, 884, 771], \"label\": \"Text\"}, {\"bbox_2d\": [171, 779, 354, 790], \"label\": \"Text\"}, {\"bbox_2d\": [186, 798, 629, 810], \"label\": \"Text\"}, {\"bbox_2d\": [186, 818, 630, 830], \"label\": \"Text\"}, {\"bbox_2d\": [186, 838, 768, 850], \"label\": \"Text\"}, {\"bbox_2d\": [186, 858, 655, 869], \"label\": \"Text\"}, {\"bbox_2d\": [186, 472, 667, 484], \"label\": \"List-item\"}, {\"bbox_2d\": [186, 492, 884, 518], \"label\": \"List-item\"}, {\"bbox_2d\": [228, 526, 880, 553], \"label\": \"List-item\"}, {\"bbox_2d\": [223, 561, 683, 573], \"label\": \"List-item\"}, {\"bbox_2d\": [509, 979, 527, 991], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [319, 46, 910, 159], \"label\": \"Text\"}, {\"bbox_2d\": [319, 177, 909, 219], \"label\": \"Text\"}, {\"bbox_2d\": [513, 511, 910, 570], \"label\": \"Text\"}, {\"bbox_2d\": [319, 438, 900, 497], \"label\": \"Text\"}, {\"bbox_2d\": [319, 778, 908, 821], \"label\": \"Text\"}, {\"bbox_2d\": [319, 674, 910, 764], \"label\": \"Text\"}, {\"bbox_2d\": [513, 585, 910, 658], \"label\": \"Text\"}, {\"bbox_2d\": [319, 236, 920, 425], \"label\": \"Picture\"}, {\"bbox_2d\": [88, 510, 487, 744], \"label\": \"Picture\"}, {\"bbox_2d\": [90, 989, 103, 998], \"label\": \"Page-footer\"}, {\"bbox_2d\": [49, 156, 260, 170], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [50, 158, 259, 168], \"label\": \"Page-header\"}, {\"bbox_2d\": [319, 47, 911, 158], \"label\": \"Text\"}, {\"bbox_2d\": [319, 176, 908, 220], \"label\": \"Text\"}, {\"bbox_2d\": [323, 237, 638, 246], \"label\": \"Caption\"}, {\"bbox_2d\": [317, 257, 918, 424], \"label\": \"Picture\"}, {\"bbox_2d\": [87, 532, 490, 746], \"label\": \"Picture\"}, {\"bbox_2d\": [92, 513, 330, 521], \"label\": \"Caption\"}, {\"bbox_2d\": [319, 438, 895, 497], \"label\": \"Text\"}, {\"bbox_2d\": [512, 511, 900, 570], \"label\": \"Text\"}, {\"bbox_2d\": [512, 584, 912, 659], \"label\": \"Text\"}, {\"bbox_2d\": [512, 673, 913, 732], \"label\": \"Text\"}, {\"bbox_2d\": [319, 736, 912, 764], \"label\": \"Text\"}, {\"bbox_2d\": [319, 778, 905, 821], \"label\": \"Text\"}, {\"bbox_2d\": [93, 990, 101, 998], \"label\": \"Page-footer\"}, {\"bbox_2d\": [0, 0, 275, 143], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [111, 48, 310, 77], \"label\": \"Page-header\"}, {\"bbox_2d\": [194, 92, 841, 913], \"label\": \"Picture\"}, {\"bbox_2d\": [342, 924, 685, 944], \"label\": \"Caption\"}, {\"bbox_2d\": [496, 971, 526, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [110, 51, 322, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [193, 93, 841, 913], \"label\": \"Picture\"}, {\"bbox_2d\": [339, 927, 680, 943], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [97, 444, 938, 596], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [97, 440, 941, 595], \"label\": \"Text\"}, {\"bbox_2d\": [1002, 167, 1011, 234], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [79, 23, 960, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [79, 82, 449, 111], \"label\": \"Text\"}, {\"bbox_2d\": [79, 120, 481, 242], \"label\": \"Text\"}, {\"bbox_2d\": [79, 251, 464, 374], \"label\": \"Text\"}, {\"bbox_2d\": [79, 382, 483, 505], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 468, 787], \"label\": \"Text\"}, {\"bbox_2d\": [79, 794, 457, 918], \"label\": \"Text\"}, {\"bbox_2d\": [513, 82, 912, 168], \"label\": \"Text\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [513, 494, 912, 712], \"label\": \"Text\"}, {\"bbox_2d\": [513, 719, 917, 918], \"label\": \"Text\"}, {\"bbox_2d\": [79, 532, 275, 543], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 195, 637, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [513, 505, 909, 712], \"label\": \"Text\"}, {\"bbox_2d\": [513, 719, 916, 918], \"label\": \"Text\"}, {\"bbox_2d\": [79, 794, 452, 918], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 461, 787], \"label\": \"Text\"}, {\"bbox_2d\": [79, 251, 464, 374], \"label\": \"Text\"}, {\"bbox_2d\": [79, 120, 481, 242], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 468, 787], \"label\": \"Text\"}, {\"bbox_2d\": [79, 532, 275, 543], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 82, 912, 168], \"label\": \"Text\"}, {\"bbox_2d\": [513, 195, 637, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 251, 464, 374], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 461, 787], \"label\": \"Text\"}, {\"bbox_2d\": [513, 719, 912, 918], \"label\": \"Text\"}, {\"bbox_2d\": [513, 505, 909, 712], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 468, 787], \"label\": \"Text\"}, {\"bbox_2d\": [513, 82, 912, 168], \"label\": \"Text\"}, {\"bbox_2d\": [79, 251, 464, 374], \"label\": \"Text\"}, {\"bbox_2d\": [513, 195, 637, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 468, 787], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 464, 787], \"label\": \"Text\"}, {\"bbox_2d\": [513, 505, 909, 712], \"label\": \"Text\"}, {\"bbox_2d\": [513, 719, 909, 918], \"label\": \"Text\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 464, 787], \"label\": \"Text\"}, {\"bbox_2d\": [513, 82, 912, 168], \"label\": \"Text\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [513, 195, 637, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [79, 251, 464, 374], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 464, 787], \"label\": \"Text\"}, {\"bbox_2d\": [513, 719, 909, 918], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 468, 787], \"label\": \"Text\"}, {\"bbox_2d\": [513, 82, 912, 168], \"label\": \"Text\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [513, 505, 909, 712], \"label\": \"Text\"}, {\"bbox_2d\": [513, 719, 909, 918], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 464, 787], \"label\": \"Text\"}, {\"bbox_2d\": [513, 505, 909, 712], \"label\": \"Text\"}, {\"bbox_2d\": [513, 719, 909, 918], \"label\": \"Text\"}, {\"bbox_2d\": [513, 82, 912, 168], \"label\": \"Text\"}, {\"bbox_2d\": [79, 251, 464, 374], \"label\": \"Text\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [513, 195, 637, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 213, 912, 487], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 464, 787], \"label\": \"Text\"}, {\"bbox_2d\": [513, 719, 909, 918], \"label\": \"Text\"}, {\"bbox_2d\": [513, 82, 912, 168], \"label\": \"Text\"}, {\"bbox_2d\": [513, 213, 912, 4", "label": "```json[{\"bbox_2d\": [78, 79, 454, 112], \"label\": \"Text\"}, {\"bbox_2d\": [78, 116, 484, 243], \"label\": \"Text\"}, {\"bbox_2d\": [78, 247, 469, 375], \"label\": \"Text\"}, {\"bbox_2d\": [78, 378, 486, 506], \"label\": \"Text\"}, {\"bbox_2d\": [513, 79, 918, 168], \"label\": \"Text\"}, {\"bbox_2d\": [513, 210, 922, 487], \"label\": \"Text\"}, {\"bbox_2d\": [513, 491, 915, 712], \"label\": \"Text\"}, {\"bbox_2d\": [78, 547, 476, 787], \"label\": \"Text\"}, {\"bbox_2d\": [78, 791, 460, 918], \"label\": \"Text\"}, {\"bbox_2d\": [513, 716, 922, 918], \"label\": \"Text\"}, {\"bbox_2d\": [478, 945, 557, 1006], \"label\": \"Picture\"}, {\"bbox_2d\": [78, 528, 276, 544], \"label\": \"Section-header\"}, {\"bbox_2d\": [513, 191, 638, 207], \"label\": \"Section-header\"}, {\"bbox_2d\": [301, 21, 753, 37], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [103, 64, 846, 101], \"label\": \"Page-header\"}, {\"bbox_2d\": [103, 135, 345, 154], \"label\": \"Section-header\"}, {\"bbox_2d\": [103, 575, 347, 594], \"label\": \"Section-header\"}, {\"bbox_2d\": [142, 162, 855, 213], \"label\": \"Text\"}, {\"bbox_2d\": [142, 223, 855, 274], \"label\": \"Text\"}, {\"bbox_2d\": [142, 284, 855, 334], \"label\": \"Text\"}, {\"bbox_2d\": [142, 345, 855, 396], \"label\": \"Text\"}, {\"bbox_2d\": [142, 406, 855, 471], \"label\": \"Text\"}, {\"bbox_2d\": [142, 482, 855, 516], \"label\": \"Text\"}, {\"bbox_2d\": [142, 526, 820, 542], \"label\": \"Text\"}, {\"bbox_2d\": [142, 766, 414, 816], \"label\": \"Text\"}, {\"bbox_2d\": [436, 603, 680, 819], \"label\": \"Table\"}, {\"bbox_2d\": [908, 944, 925, 959], \"label\": \"Page-footer\"}, {\"bbox_2d\": [893, 962, 999, 1020], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [103, 69, 842, 100], \"label\": \"Page-header\"}, {\"bbox_2d\": [101, 140, 343, 152], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 580, 346, 592], \"label\": \"Section-header\"}, {\"bbox_2d\": [141, 166, 853, 213], \"label\": \"Text\"}, {\"bbox_2d\": [141, 227, 853, 273], \"label\": \"Text\"}, {\"bbox_2d\": [141, 287, 853, 333], \"label\": \"Text\"}, {\"bbox_2d\": [141, 347, 853, 394], \"label\": \"Text\"}, {\"bbox_2d\": [141, 408, 853, 471], \"label\": \"Text\"}, {\"bbox_2d\": [141, 485, 853, 515], \"label\": \"Text\"}, {\"bbox_2d\": [141, 529, 820, 541], \"label\": \"Text\"}, {\"bbox_2d\": [141, 602, 681, 709], \"label\": \"Table\"}, {\"bbox_2d\": [139, 733, 678, 819], \"label\": \"Table\"}, {\"bbox_2d\": [906, 948, 927, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [893, 964, 984, 1018], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 62, 901, 417], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 448, 925, 543], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 578, 506, 600], \"label\": \"Text\"}, {\"bbox_2d\": [91, 604, 506, 644], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 647, 506, 687], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 690, 506, 730], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 733, 506, 787], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 790, 506, 828], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 831, 506, 860], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 863, 506, 903], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 907, 506, 931], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 935, 506, 959], \"label\": \"List-item\"}, {\"bbox_2d\": [568, 578, 925, 600], \"label\": \"Text\"}, {\"bbox_2d\": [568, 604, 925, 644], \"label\": \"List-item\"}, {\"bbox_2d\": [568, 647, 925, 687], \"label\": \"List-item\"}, {\"bbox_2d\": [568, 690, 925, 730], \"label\": \"List-item\"}, {\"bbox_2d\": [568, 733, 925, 761], \"label\": \"List-item\"}, {\"bbox_2d\": [568, 765, 925, 819], \"label\": \"List-item\"}, {\"bbox_2d\": [568, 822, 925, 851], \"label\": \"List-item\"}, {\"bbox_2d\": [568, 855, 925, 903], \"label\": \"List-item\"}, {\"bbox_2d\": [568, 907, 925, 959], \"label\": \"List-item\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [943, 38, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [144, 65, 905, 424], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 448, 951, 543], \"label\": \"Caption\"}, {\"bbox_2d\": [567, 577, 951, 601], \"label\": \"Text\"}, {\"bbox_2d\": [99, 604, 506, 642], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 604, 951, 642], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 645, 951, 683], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 645, 506, 683], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 686, 506, 710], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 686, 951, 724], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 713, 506, 737], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 741, 506, 779], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 727, 881, 737], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 741, 951, 765], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 768, 951, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 823, 951, 874], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 878, 951, 929], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 932, 951, 943], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 782, 506, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 823, 506, 861], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 864, 506, 888], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 891, 506, 929], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 932, 506, 943], \"label\": \"List-item\"}, {\"bbox_2d\": [122, 577, 506, 601], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [746, 14, 915, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [952, 14, 999, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 100, 282, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 413, 227, 426], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 766, 946, 780], \"label\": \"Text\"}, {\"bbox_2d\": [121, 787, 513, 796], \"label\": \"Text\"}, {\"bbox_2d\": [121, 327, 946, 347], \"label\": \"Text\"}, {\"bbox_2d\": [121, 347, 513, 356], \"label\": \"Text\"}, {\"bbox_2d\": [121, 356, 919, 366], \"label\": \"Text\"}, {\"bbox_2d\": [121, 361, 920, 381], \"label\": \"Text\"}, {\"bbox_2d\": [116, 118, 965, 328], \"label\": \"Table\"}, {\"bbox_2d\": [115, 430, 964, 654], \"label\": \"Table\"}, {\"bbox_2d\": [862, 998, 990, 1018], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [953, 18, 998, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [746, 20, 905, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [122, 102, 282, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 329, 949, 348], \"label\": \"Text\"}, {\"bbox_2d\": [159, 350, 513, 358], \"label\": \"Text\"}, {\"bbox_2d\": [159, 361, 923, 379], \"label\": \"Text\"}, {\"bbox_2d\": [120, 656, 949, 674], \"label\": \"Text\"}, {\"bbox_2d\": [159, 677, 513, 685], \"label\": \"Text\"}, {\"bbox_2d\": [122, 416, 227, 426], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 120, 965, 326], \"label\": \"Table\"}, {\"bbox_2d\": [119, 434, 964, 653], \"label\": \"Table\"}, {\"bbox_2d\": [863, 1001, 992, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 98, 914, 187], \"label\": \"Text\"}, {\"bbox_2d\": [121, 406, 914, 523], \"label\": \"Text\"}, {\"bbox_2d\": [121, 537, 914, 679], \"label\": \"Text\"}, {\"bbox_2d\": [121, 862, 914, 921], \"label\": \"Footnote\"}, {\"bbox_2d\": [137, 922, 914, 946], \"label\": \"Footnote\"}, {\"bbox_2d\": [184, 198, 821, 347], \"label\": \"Picture\"}, {\"bbox_2d\": [280, 361, 368, 375], \"label\": \"Caption\"}, {\"bbox_2d\": [291, 388, 724, 405], \"label\": \"Caption\"}, {\"bbox_2d\": [508, 971, 527, 985], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 98, 914, 187], \"label\": \"Text\"}, {\"bbox_2d\": [121, 406, 914, 522], \"label\": \"Text\"}, {\"bbox_2d\": [121, 537, 914, 679], \"label\": \"Text\"}, {\"bbox_2d\": [121, 860, 914, 919], \"label\": \"Footnote\"}, {\"bbox_2d\": [121, 922, 914, 943], \"label\": \"Footnote\"}, {\"bbox_2d\": [509, 971, 526, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [290, 386, 712, 404], \"label\": \"Caption\"}, {\"bbox_2d\": [183, 196, 825, 374], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 107, 886, 149], \"label\": \"Text\"}, {\"bbox_2d\": [147, 183, 886, 758], \"label\": \"Text\"}, {\"bbox_2d\": [147, 796, 886, 886], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [148, 108, 886, 147], \"label\": \"Text\"}, {\"bbox_2d\": [148, 185, 888, 758], \"label\": \"Text\"}, {\"bbox_2d\": [148, 797, 887, 885], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 141, 915, 167], \"label\": \"Text\"}, {\"bbox_2d\": [91, 179, 915, 423], \"label\": \"Text\"}, {\"bbox_2d\": [91, 434, 915, 489], \"label\": \"Text\"}, {\"bbox_2d\": [91, 500, 915, 570], \"label\": \"Text\"}, {\"bbox_2d\": [91, 582, 915, 666], \"label\": \"Text\"}, {\"bbox_2d\": [91, 678, 915, 732], \"label\": \"Text\"}, {\"bbox_2d\": [91, 769, 915, 808], \"label\": \"Text\"}, {\"bbox_2d\": [91, 821, 915, 905], \"label\": \"Text\"}, {\"bbox_2d\": [91, 755, 281, 762], \"label\": \"Section-header\"}, {\"bbox_2d\": [511, 989, 525, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 141, 944, 167], \"label\": \"Text\"}, {\"bbox_2d\": [91, 179, 944, 421], \"label\": \"Text\"}, {\"bbox_2d\": [91, 433, 944, 488], \"label\": \"Text\"}, {\"bbox_2d\": [91, 500, 944, 569], \"label\": \"Text\"}, {\"bbox_2d\": [91, 581, 945, 665], \"label\": \"Text\"}, {\"bbox_2d\": [91, 677, 944, 732], \"label\": \"Text\"}, {\"bbox_2d\": [91, 755, 281, 763], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 769, 948, 810], \"label\": \"Text\"}, {\"bbox_2d\": [91, 822, 944, 905], \"label\": \"Text\"}, {\"bbox_2d\": [510, 988, 527, 1005], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 83, 705, 97], \"label\": \"Text\"}, {\"bbox_2d\": [274, 836, 866, 866], \"label\": \"Text\"}, {\"bbox_2d\": [274, 884, 900, 929], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 378, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [272, 108, 912, 808], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 705, 98], \"label\": \"Text\"}, {\"bbox_2d\": [272, 109, 912, 809], \"label\": \"Table\"}, {\"bbox_2d\": [273, 837, 867, 866], \"label\": \"Text\"}, {\"bbox_2d\": [273, 884, 900, 929], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [120, 45, 371, 67], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 70, 323, 79], \"label\": \"Text\"}, {\"bbox_2d\": [120, 101, 212, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 299, 248, 310], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 538, 254, 550], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 817, 407, 825], \"label\": \"Text\"}, {\"bbox_2d\": [863, 999, 991, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [117, 556, 965, 816], \"label\": \"Table\"}, {\"bbox_2d\": [119, 316, 965, 498], \"label\": \"Table\"}, {\"bbox_2d\": [120, 119, 964, 196], \"label\": \"Table\"}, {\"bbox_2d\": [120, 200, 951, 220], \"label\": \"Text\"}, {\"bbox_2d\": [172, 220, 955, 251], \"label\": \"Text\"}, {\"bbox_2d\": [953, 16, 997, 26], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [120, 47, 371, 67], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 300, 247, 310], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 102, 212, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 539, 254, 549], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 70, 323, 78], \"label\": \"Text\"}, {\"bbox_2d\": [120, 200, 955, 218], \"label\": \"Text\"}, {\"bbox_2d\": [159, 221, 957, 250], \"label\": \"Text\"}, {\"bbox_2d\": [120, 818, 408, 826], \"label\": \"Text\"}, {\"bbox_2d\": [114, 121, 966, 196], \"label\": \"Table\"}, {\"bbox_2d\": [118, 318, 967, 499], \"label\": \"Table\"}, {\"bbox_2d\": [115, 554, 964, 816], \"label\": \"Table\"}, {\"bbox_2d\": [863, 1001, 992, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [953, 18, 998, 28], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [807, 69, 988, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [579, 114, 991, 258], \"label\": \"Text\"}, {\"bbox_2d\": [584, 262, 991, 389], \"label\": \"Text\"}, {\"bbox_2d\": [584, 394, 991, 764], \"label\": \"Text\"}, {\"bbox_2d\": [137, 560, 546, 985], \"label\": \"Text\"}, {\"bbox_2d\": [587, 768, 995, 985], \"label\": \"Text\"}, {\"bbox_2d\": [135, 334, 546, 556], \"label\": \"Text\"}, {\"bbox_2d\": [558, 998, 575, 1010], \"label\": \"Page-footer\"}, {\"bbox_2d\": [132, 115, 542, 315], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [807, 68, 1004, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [581, 111, 999, 260], \"label\": \"Text\"}, {\"bbox_2d\": [582, 261, 1009, 390], \"label\": \"Text\"}, {\"bbox_2d\": [132, 332, 560, 556], \"label\": \"Text\"}, {\"bbox_2d\": [127, 111, 553, 319], \"label\": \"Table\"}, {\"bbox_2d\": [584, 391, 1008, 765], \"label\": \"Text\"}, {\"bbox_2d\": [135, 557, 560, 986], \"label\": \"Text\"}, {\"bbox_2d\": [586, 766, 1009, 985], \"label\": \"Text\"}, {\"bbox_2d\": [558, 996, 581, 1010], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 71, 905, 83], \"label\": \"Text\"}, {\"bbox_2d\": [536, 85, 951, 278], \"label\": \"Text\"}, {\"bbox_2d\": [536, 280, 951, 366], \"label\": \"Text\"}, {\"bbox_2d\": [536, 368, 951, 400], \"label\": \"Text\"}, {\"bbox_2d\": [536, 404, 951, 447], \"label\": \"Text\"}, {\"bbox_2d\": [91, 471, 506, 588], \"label\": \"Text\"}, {\"bbox_2d\": [536, 552, 951, 619], \"label\": \"Text\"}, {\"bbox_2d\": [536, 622, 951, 654], \"label\": \"Text\"}, {\"bbox_2d\": [91, 592, 506, 609], \"label\": \"Text\"}, {\"bbox_2d\": [91, 612, 506, 637], \"label\": \"Text\"}, {\"bbox_2d\": [91, 912, 506, 944], \"label\": \"Text\"}, {\"bbox_2d\": [536, 912, 951, 944], \"label\": \"Text\"}, {\"bbox_2d\": [91, 280, 506, 334], \"label\": \"Text\"}, {\"bbox_2d\": [91, 355, 506, 383], \"label\": \"Text\"}, {\"bbox_2d\": [280, 322, 506, 345], \"label\": \"Formula\"}, {\"bbox_2d\": [114, 378, 506, 406], \"label\": \"Formula\"}, {\"bbox_2d\": [114, 402, 506, 432], \"label\": \"Formula\"}, {\"bbox_2d\": [559, 447, 951, 533], \"label\": \"Formula\"}, {\"bbox_2d\": [108, 706, 951, 886], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 130, 506, 206], \"label\": \"Text\"}, {\"bbox_2d\": [228, 94, 483, 117], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 71, 127, 83], \"label\": \"Text\"}, {\"bbox_2d\": [114, 228, 506, 270], \"label\": \"Formula\"}, {\"bbox_2d\": [231, 406, 506, 488], \"label\": \"Formula\"}, {\"bbox_2d\": [160, 621, 506, 663], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 71, 125, 84], \"label\": \"Text\"}, {\"bbox_2d\": [91, 125, 506, 190], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 904, 84], \"label\": \"Text\"}, {\"bbox_2d\": [536, 87, 951, 233], \"label\": \"Text\"}, {\"bbox_2d\": [536, 226, 951, 285], \"label\": \"Text\"}, {\"bbox_2d\": [536, 278, 951, 377], \"label\": \"Text\"}, {\"bbox_2d\": [536, 379, 951, 416], \"label\": \"Text\"}, {\"bbox_2d\": [227, 96, 369, 114], \"label\": \"Formula\"}, {\"bbox_2d\": [196, 316, 506, 349], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 270, 506, 312], \"label\": \"Text\"}, {\"bbox_2d\": [116, 203, 480, 272], \"label\": \"Formula\"}, {\"bbox_2d\": [114, 397, 506, 463], \"label\": \"Formula\"}, {\"bbox_2d\": [558, 426, 928, 532], \"label\": \"Formula\"}, {\"bbox_2d\": [133, 622, 464, 659], \"label\": \"Formula\"}, {\"bbox_2d\": [107, 706, 951, 869], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 355, 505, 383], \"label\": \"Text\"}, {\"bbox_2d\": [91, 473, 506, 563], \"label\": \"Text\"}, {\"bbox_2d\": [91, 565, 506, 608], \"label\": \"Text\"}, {\"bbox_2d\": [536, 552, 951, 617], \"label\": \"Text\"}, {\"bbox_2d\": [536, 621, 951, 649], \"label\": \"Text\"}, {\"bbox_2d\": [91, 914, 506, 942], \"label\": \"Text\"}, {\"bbox_2d\": [536, 914, 951, 942], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [129, 18, 261, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 101, 599, 138], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 138, 367, 163], \"label\": \"Text\"}, {\"bbox_2d\": [69, 175, 919, 760], \"label\": \"Table\"}, {\"bbox_2d\": [44, 997, 170, 1015], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [70, 175, 915, 760], \"label\": \"Table\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [72, 102, 600, 137], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 153, 313, 162], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 141, 367, 149], \"label\": \"Text\"}, {\"bbox_2d\": [130, 20, 261, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [37, 18, 82, 28], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [73, 207, 913, 231], \"label\": \"Text\"}, {\"bbox_2d\": [73, 361, 801, 373], \"label\": \"Text\"}, {\"bbox_2d\": [73, 499, 567, 511], \"label\": \"Text\"}, {\"bbox_2d\": [73, 689, 775, 701], \"label\": \"Text\"}, {\"bbox_2d\": [73, 479, 328, 491], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 667, 455, 679], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 999, 223, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [16, 775, 29, 863], \"label\": \"Page-header\"}, {\"bbox_2d\": [71, 232, 919, 347], \"label\": \"Table\"}, {\"bbox_2d\": [70, 382, 919, 440], \"label\": \"Table\"}, {\"bbox_2d\": [70, 520, 919, 609], \"label\": \"Table\"}, {\"bbox_2d\": [70, 710, 919, 828], \"label\": \"Table\"}, {\"bbox_2d\": [72, 488, 919, 609], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [72, 231, 918, 348], \"label\": \"Table\"}, {\"bbox_2d\": [74, 209, 913, 233], \"label\": \"Text\"}, {\"bbox_2d\": [73, 380, 914, 440], \"label\": \"Table\"}, {\"bbox_2d\": [74, 363, 799, 374], \"label\": \"Text\"}, {\"bbox_2d\": [74, 481, 327, 491], \"label\": \"Section-header\"}, {\"bbox_2d\": [74, 499, 566, 510], \"label\": \"Text\"}, {\"bbox_2d\": [74, 670, 455, 680], \"label\": \"Section-header\"}, {\"bbox_2d\": [74, 691, 775, 703], \"label\": \"Text\"}, {\"bbox_2d\": [72, 520, 920, 608], \"label\": \"Table\"}, {\"bbox_2d\": [72, 711, 917, 830], \"label\": \"Table\"}, {\"bbox_2d\": [12, 770, 22, 860], \"label\": \"Page-header\"}, {\"bbox_2d\": [51, 1002, 223, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [330, 92, 705, 456], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 485, 911, 557], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 585, 911, 957], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 485, 911, 556], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 585, 911, 957], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [332, 94, 702, 454], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 164], \"label\": \"Text\"}, {\"bbox_2d\": [121, 177, 911, 357], \"label\": \"Text\"}, {\"bbox_2d\": [121, 445, 911, 707], \"label\": \"Text\"}, {\"bbox_2d\": [121, 797, 911, 866], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 750, 412, 763], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 400, 279, 413], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 165], \"label\": \"Text\"}, {\"bbox_2d\": [121, 177, 911, 357], \"label\": \"Text\"}, {\"bbox_2d\": [121, 445, 911, 707], \"label\": \"Text\"}, {\"bbox_2d\": [121, 795, 911, 866], \"label\": \"Text\"}, {\"bbox_2d\": [121, 751, 411, 764], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 401, 279, 414], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 49, 295, 79], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 127, 917, 161], \"label\": \"Text\"}, {\"bbox_2d\": [121, 160, 354, 174], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 294, 462, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 540, 275, 554], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 656, 318, 670], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 178, 917, 241], \"label\": \"Text\"}, {\"bbox_2d\": [121, 244, 917, 291], \"label\": \"Text\"}, {\"bbox_2d\": [183, 309, 883, 341], \"label\": \"Text\"}, {\"bbox_2d\": [121, 343, 894, 374], \"label\": \"Text\"}, {\"bbox_2d\": [121, 377, 917, 424], \"label\": \"Text\"}, {\"bbox_2d\": [121, 425, 917, 473], \"label\": \"Text\"}, {\"bbox_2d\": [121, 476, 910, 537], \"label\": \"Text\"}, {\"bbox_2d\": [121, 558, 917, 591], \"label\": \"Text\"}, {\"bbox_2d\": [121, 592, 917, 655], \"label\": \"Text\"}, {\"bbox_2d\": [183, 674, 802, 688], \"label\": \"Text\"}, {\"bbox_2d\": [121, 690, 901, 754], \"label\": \"Text\"}, {\"bbox_2d\": [121, 757, 917, 805], \"label\": \"Text\"}, {\"bbox_2d\": [121, 808, 883, 838], \"label\": \"Text\"}, {\"bbox_2d\": [121, 841, 917, 888], \"label\": \"Text\"}, {\"bbox_2d\": [121, 891, 903, 921], \"label\": \"Text\"}, {\"bbox_2d\": [121, 974, 443, 987], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"Text\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"Text\"}, {\"bbox_2d\": [121, 126, 958, 159], \"label\": \"Text\"}, {\"bbox_2d\": [121, 176, 939, 242], \"label\": \"Text\"}, {\"bbox_2d\": [121, 242, 939, 291], \"label\": \"Text\"}, {\"bbox_2d\": [121, 342, 925, 374], \"label\": \"Text\"}, {\"bbox_2d\": [121, 308, 905, 341], \"label\": \"Text\"}, {\"bbox_2d\": [121, 375, 960, 424], \"label\": \"Text\"}, {\"bbox_2d\": [121, 424, 957, 474], \"label\": \"Text\"}, {\"bbox_2d\": [121, 474, 941, 540], \"label\": \"Text\"}, {\"bbox_2d\": [121, 557, 947, 589], \"label\": \"Text\"}, {\"bbox_2d\": [121, 590, 954, 655], \"label\": \"Text\"}, {\"bbox_2d\": [182, 672, 817, 689], \"label\": \"Text\"}, {\"bbox_2d\": [121, 689, 912, 755], \"label\": \"Text\"}, {\"bbox_2d\": [121, 755, 955, 804], \"label\": \"Text\"}, {\"bbox_2d\": [121, 805, 910, 838], \"label\": \"Text\"}, {\"bbox_2d\": [121, 838, 947, 887], \"label\": \"Text\"}, {\"bbox_2d\": [121, 888, 915, 920], \"label\": \"Text\"}, {\"bbox_2d\": [121, 656, 316, 672], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 540, 274, 556], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 159, 353, 175], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 292, 471, 308], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [189, 111, 880, 142], \"label\": \"Text\"}, {\"bbox_2d\": [172, 151, 863, 182], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 191, 880, 242], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 251, 855, 302], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 311, 889, 426], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 435, 874, 486], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 495, 880, 564], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 574, 874, 666], \"label\": \"List-item\"}, {\"bbox_2d\": [172, 675, 886, 749], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 757, 892, 929], \"label\": \"List-item\"}, {\"bbox_2d\": [513, 961, 522, 973], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [190, 106, 887, 142], \"label\": \"Text\"}, {\"bbox_2d\": [512, 958, 528, 974], \"label\": \"Page-footer\"}, {\"bbox_2d\": [171, 146, 867, 183], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 187, 886, 243], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 247, 860, 304], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 308, 896, 425], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 429, 879, 486], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 490, 886, 566], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 570, 879, 667], \"label\": \"List-item\"}, {\"bbox_2d\": [171, 671, 891, 748], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 752, 897, 929], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [68, 52, 273, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [69, 94, 123, 111], \"label\": \"Text\"}, {\"bbox_2d\": [517, 94, 926, 148], \"label\": \"Text\"}, {\"bbox_2d\": [516, 317, 926, 519], \"label\": \"Text\"}, {\"bbox_2d\": [516, 521, 926, 667], \"label\": \"Text\"}, {\"bbox_2d\": [516, 672, 926, 965], \"label\": \"Text\"}, {\"bbox_2d\": [63, 414, 473, 467], \"label\": \"Text\"}, {\"bbox_2d\": [63, 594, 472, 685], \"label\": \"Text\"}, {\"bbox_2d\": [63, 798, 473, 965], \"label\": \"Text\"}, {\"bbox_2d\": [63, 269, 98, 283], \"label\": \"Text\"}, {\"bbox_2d\": [107, 116, 479, 188], \"label\": \"Text\"}, {\"bbox_2d\": [106, 192, 478, 262], \"label\": \"Text\"}, {\"bbox_2d\": [114, 289, 339, 396], \"label\": \"Text\"}, {\"bbox_2d\": [522, 153, 926, 306], \"label\": \"Table\"}, {\"bbox_2d\": [152, 695, 426, 726], \"label\": \"Formula\"}, {\"bbox_2d\": [152, 747, 425, 777], \"label\": \"Formula\"}, {\"bbox_2d\": [478, 977, 505, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [70, 50, 293, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [518, 93, 936, 128], \"label\": \"Text\"}, {\"bbox_2d\": [69, 94, 131, 111], \"label\": \"Text\"}, {\"bbox_2d\": [67, 265, 105, 282], \"label\": \"Text\"}, {\"bbox_2d\": [105, 287, 350, 397], \"label\": \"Text\"}, {\"bbox_2d\": [66, 411, 487, 465], \"label\": \"Text\"}, {\"bbox_2d\": [516, 314, 937, 517], \"label\": \"Text\"}, {\"bbox_2d\": [521, 152, 926, 308], \"label\": \"Table\"}, {\"bbox_2d\": [106, 114, 492, 187], \"label\": \"List-item\"}, {\"bbox_2d\": [107, 188, 484, 262], \"label\": \"List-item\"}, {\"bbox_2d\": [105, 467, 424, 485], \"label\": \"Text\"}, {\"bbox_2d\": [105, 485, 355, 503], \"label\": \"Text\"}, {\"bbox_2d\": [105, 504, 480, 540], \"label\": \"Text\"}, {\"bbox_2d\": [104, 541, 401, 558], \"label\": \"Text\"}, {\"bbox_2d\": [104, 559, 452, 577], \"label\": \"Text\"}, {\"bbox_2d\": [63, 591, 490, 682], \"label\": \"Text\"}, {\"bbox_2d\": [515, 518, 930, 666], \"label\": \"Text\"}, {\"bbox_2d\": [513, 667, 938, 963], \"label\": \"Text\"}, {\"bbox_2d\": [62, 794, 485, 961], \"label\": \"Text\"}, {\"bbox_2d\": [479, 973, 511, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [819, 128, 943, 141], \"label\": \"Page-header\"}, {\"bbox_2d\": [106, 165, 930, 187], \"label\": \"Section-header\"}, {\"bbox_2d\": [226, 211, 816, 249], \"label\": \"Section-header\"}, {\"bbox_2d\": [421, 397, 616, 411], \"label\": \"Section-header\"}, {\"bbox_2d\": [317, 433, 724, 448], \"label\": \"Text\"}, {\"bbox_2d\": [353, 470, 685, 484], \"label\": \"Text\"}, {\"bbox_2d\": [94, 982, 195, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [903, 962, 948, 1003], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [818, 122, 944, 144], \"label\": \"Page-header\"}, {\"bbox_2d\": [106, 156, 929, 189], \"label\": \"Section-header\"}, {\"bbox_2d\": [224, 205, 817, 251], \"label\": \"Section-header\"}, {\"bbox_2d\": [315, 392, 724, 485], \"label\": \"Text\"}, {\"bbox_2d\": [95, 978, 196, 997], \"label\": \"Page-footer\"}, {\"bbox_2d\": [902, 957, 947, 1007], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [277, 39, 672, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 75, 310, 88], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 753, 362, 766], \"label\": \"Section-header\"}, {\"bbox_2d\": [509, 713, 894, 726], \"label\": \"Section-header\"}, {\"bbox_2d\": [509, 506, 777, 519], \"label\": \"Section-header\"}, {\"bbox_2d\": [511, 77, 904, 104], \"label\": \"Text\"}, {\"bbox_2d\": [536, 119, 904, 146], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 161, 904, 188], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 203, 904, 229], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 244, 904, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 286, 904, 312], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 327, 904, 354], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 369, 904, 470], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 106, 481, 193], \"label\": \"Text\"}, {\"bbox_2d\": [111, 210, 481, 297], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 312, 481, 473], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 488, 481, 619], \"label\": \"List-item\"}, {\"bbox_2d\": [111, 632, 481, 719], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 538, 904, 684], \"label\": \"Text\"}, {\"bbox_2d\": [536, 742, 904, 947], \"label\": \"Text\"}, {\"bbox_2d\": [87, 784, 481, 947], \"label\": \"Text\"}, {\"bbox_2d\": [42, 974, 172, 986], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [43, 38, 52, 51], \"label\": \"Page-header\"}, {\"bbox_2d\": [276, 38, 670, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [88, 75, 310, 89], \"label\": \"Section-header\"}, {\"bbox_2d\": [511, 504, 778, 518], \"label\": \"Section-header\"}, {\"bbox_2d\": [511, 711, 894, 725], \"label\": \"Section-header\"}, {\"bbox_2d\": [88, 753, 361, 767], \"label\": \"Section-header\"}, {\"bbox_2d\": [43, 974, 172, 986], \"label\": \"Page-footer\"}, {\"bbox_2d\": [511, 740, 904, 857], \"label\": \"Text\"}, {\"bbox_2d\": [511, 860, 904, 948], \"label\": \"Text\"}, {\"bbox_2d\": [88, 785, 481, 948], \"label\": \"Text\"}, {\"bbox_2d\": [511, 536, 904, 684], \"label\": \"Text\"}, {\"bbox_2d\": [511, 76, 904, 104], \"label\": \"Text\"}, {\"bbox_2d\": [88, 106, 481, 194], \"label\": \"Text\"}, {\"bbox_2d\": [115, 210, 481, 298], \"label\": \"List-item\"}, {\"bbox_2d\": [115, 310, 481, 473], \"label\": \"List-item\"}, {\"bbox_2d\": [115, 486, 481, 619], \"label\": \"List-item\"}, {\"bbox_2d\": [115, 631, 480, 719], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 120, 904, 148], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 161, 904, 189], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 202, 904, 230], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 243, 903, 271], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 284, 903, 312], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 325, 903, 354], \"label\": \"List-item\"}, {\"bbox_2d\": [538, 367, 904, 470], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [680, 14, 917, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [952, 14, 997, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [862, 998, 990, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [162, 905, 671, 925], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 927, 951, 950], \"label\": \"List-item\"}, {\"bbox_2d\": [162, 879, 958, 893], \"label\": \"Text\"}, {\"bbox_2d\": [159, 102, 966, 876], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [681, 18, 997, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [161, 883, 961, 903], \"label\": \"Text\"}, {\"bbox_2d\": [195, 906, 786, 915], \"label\": \"Text\"}, {\"bbox_2d\": [207, 918, 675, 926], \"label\": \"Text\"}, {\"bbox_2d\": [195, 930, 954, 950], \"label\": \"Text\"}, {\"bbox_2d\": [160, 498, 954, 876], \"label\": \"Table\"}, {\"bbox_2d\": [161, 103, 954, 482], \"label\": \"Table\"}, {\"bbox_2d\": [863, 1001, 992, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 329, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [699, 93, 723, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 113, 351, 123], \"label\": \"Text\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [158, 127, 289, 441], \"label\": \"Picture\"}, {\"bbox_2d\": [158, 578, 316, 908], \"label\": \"Picture\"}, {\"bbox_2d\": [552, 574, 697, 872], \"label\": \"Picture\"}, {\"bbox_2d\": [850, 574, 893, 582], \"label\": \"Text\"}, {\"bbox_2d\": [457, 598, 495, 606], \"label\": \"Text\"}, {\"bbox_2d\": [457, 127, 499, 136], \"label\": \"Text\"}, {\"bbox_2d\": [569, 127, 709, 410], \"label\": \"Picture\"}, {\"bbox_2d\": [850, 127, 895, 136], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 92, 326, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [698, 93, 723, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [282, 113, 351, 123], \"label\": \"Text\"}, {\"bbox_2d\": [677, 113, 746, 123], \"label\": \"Text\"}, {\"bbox_2d\": [161, 130, 499, 437], \"label\": \"Picture\"}, {\"bbox_2d\": [161, 597, 500, 907], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 129, 895, 411], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 573, 895, 875], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 288, 503, 386], \"label\": \"Text\"}, {\"bbox_2d\": [147, 391, 505, 575], \"label\": \"Text\"}, {\"bbox_2d\": [147, 580, 509, 695], \"label\": \"Text\"}, {\"bbox_2d\": [548, 288, 910, 335], \"label\": \"Text\"}, {\"bbox_2d\": [548, 339, 901, 472], \"label\": \"Text\"}, {\"bbox_2d\": [548, 476, 910, 695], \"label\": \"Text\"}, {\"bbox_2d\": [931, 999, 983, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [148, 843, 396, 949], \"label\": \"Picture\"}, {\"bbox_2d\": [403, 843, 653, 949], \"label\": \"Picture\"}, {\"bbox_2d\": [657, 843, 909, 949], \"label\": \"Picture\"}, {\"bbox_2d\": [147, 951, 186, 961], \"label\": \"Caption\"}, {\"bbox_2d\": [404, 951, 440, 961], \"label\": \"Caption\"}, {\"bbox_2d\": [661, 951, 724, 961], \"label\": \"Caption\"}, {\"bbox_2d\": [1010, 678, 1034, 760], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [148, 290, 504, 388], \"label\": \"Text\"}, {\"bbox_2d\": [148, 393, 509, 577], \"label\": \"Text\"}, {\"bbox_2d\": [148, 582, 511, 698], \"label\": \"Text\"}, {\"bbox_2d\": [548, 479, 913, 698], \"label\": \"Text\"}, {\"bbox_2d\": [548, 290, 912, 336], \"label\": \"Text\"}, {\"bbox_2d\": [548, 341, 900, 474], \"label\": \"Text\"}, {\"bbox_2d\": [1014, 683, 1024, 740], \"label\": \"Page-header\"}, {\"bbox_2d\": [813, 1002, 985, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [148, 952, 184, 959], \"label\": \"Caption\"}, {\"bbox_2d\": [405, 952, 440, 959], \"label\": \"Caption\"}, {\"bbox_2d\": [661, 952, 723, 959], \"label\": \"Caption\"}, {\"bbox_2d\": [147, 843, 396, 949], \"label\": \"Picture\"}, {\"bbox_2d\": [404, 843, 651, 948], \"label\": \"Picture\"}, {\"bbox_2d\": [660, 843, 910, 948], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 46, 488, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [824, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [848, 122, 881, 164], \"label\": \"Picture\"}, {\"bbox_2d\": [54, 92, 407, 312], \"label\": \"Picture\"}, {\"bbox_2d\": [232, 333, 862, 405], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 457, 860, 483], \"label\": \"Text\"}, {\"bbox_2d\": [231, 500, 495, 512], \"label\": \"Text\"}, {\"bbox_2d\": [231, 522, 408, 534], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 539, 761, 550], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 556, 716, 567], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 988, 435, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [907, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 46, 489, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [824, 45, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [847, 121, 884, 170], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 333, 876, 405], \"label\": \"Section-header\"}, {\"bbox_2d\": [55, 85, 410, 318], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 456, 866, 483], \"label\": \"Text\"}, {\"bbox_2d\": [231, 500, 495, 512], \"label\": \"Text\"}, {\"bbox_2d\": [231, 522, 408, 534], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 538, 762, 550], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 554, 715, 566], \"label\": \"List-item\"}, {\"bbox_2d\": [907, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [109, 988, 435, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 92, 911, 134], \"label\": \"Text\"}, {\"bbox_2d\": [125, 146, 911, 376], \"label\": \"Text\"}, {\"bbox_2d\": [125, 388, 911, 511], \"label\": \"Text\"}, {\"bbox_2d\": [125, 557, 414, 571], \"label\": \"Text\"}, {\"bbox_2d\": [125, 637, 911, 707], \"label\": \"Text\"}, {\"bbox_2d\": [125, 754, 911, 876], \"label\": \"Text\"}, {\"bbox_2d\": [125, 888, 911, 930], \"label\": \"Text\"}, {\"bbox_2d\": [508, 974, 527, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [334, 520, 693, 538], \"label\": \"Formula\"}, {\"bbox_2d\": [370, 597, 666, 615], \"label\": \"Formula\"}, {\"bbox_2d\": [368, 717, 911, 735], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [125, 92, 911, 132], \"label\": \"Text\"}, {\"bbox_2d\": [125, 146, 911, 375], \"label\": \"Text\"}, {\"bbox_2d\": [125, 388, 911, 510], \"label\": \"Text\"}, {\"bbox_2d\": [125, 558, 414, 572], \"label\": \"Text\"}, {\"bbox_2d\": [125, 636, 911, 706], \"label\": \"Text\"}, {\"bbox_2d\": [125, 755, 911, 876], \"label\": \"Text\"}, {\"bbox_2d\": [125, 890, 911, 930], \"label\": \"Text\"}, {\"bbox_2d\": [372, 719, 910, 743], \"label\": \"Formula\"}, {\"bbox_2d\": [372, 597, 664, 622], \"label\": \"Formula\"}, {\"bbox_2d\": [335, 522, 694, 546], \"label\": \"Formula\"}, {\"bbox_2d\": [508, 974, 527, 987], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [62, 47, 388, 59], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 79, 760, 91], \"label\": \"Text\"}, {\"bbox_2d\": [62, 236, 972, 279], \"label\": \"Text\"}, {\"bbox_2d\": [62, 330, 959, 358], \"label\": \"Text\"}, {\"bbox_2d\": [62, 480, 957, 572], \"label\": \"Text\"}, {\"bbox_2d\": [62, 590, 968, 712], \"label\": \"Text\"}, {\"bbox_2d\": [62, 808, 968, 884], \"label\": \"Text\"}, {\"bbox_2d\": [62, 902, 972, 945], \"label\": \"Text\"}, {\"bbox_2d\": [62, 777, 363, 789], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 449, 563, 461], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 415, 33, 624], \"label\": \"Page-header\"}, {\"bbox_2d\": [62, 298, 475, 310], \"label\": \"Section-header\"}, {\"bbox_2d\": [90, 101, 945, 216], \"label\": \"Picture\"}, {\"bbox_2d\": [410, 388, 625, 429], \"label\": \"Picture\"}, {\"bbox_2d\": [493, 722, 582, 762], \"label\": \"Picture\"}, {\"bbox_2d\": [509, 999, 526, 1013], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [508, 993, 527, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [60, 74, 759, 92], \"label\": \"Text\"}, {\"bbox_2d\": [60, 231, 976, 280], \"label\": \"Text\"}, {\"bbox_2d\": [60, 325, 959, 359], \"label\": \"Text\"}, {\"bbox_2d\": [60, 475, 962, 572], \"label\": \"Text\"}, {\"bbox_2d\": [60, 585, 973, 713], \"label\": \"Text\"}, {\"bbox_2d\": [60, 804, 975, 884], \"label\": \"Text\"}, {\"bbox_2d\": [60, 898, 977, 947], \"label\": \"Text\"}, {\"bbox_2d\": [60, 772, 362, 791], \"label\": \"Section-header\"}, {\"bbox_2d\": [17, 414, 36, 621], \"label\": \"Page-header\"}, {\"bbox_2d\": [60, 444, 564, 463], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 294, 475, 312], \"label\": \"Section-header\"}, {\"bbox_2d\": [60, 42, 388, 61], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 104, 943, 200], \"label\": \"Picture\"}, {\"bbox_2d\": [433, 200, 602, 217], \"label\": \"Caption\"}, {\"bbox_2d\": [408, 386, 627, 431], \"label\": \"Picture\"}, {\"bbox_2d\": [489, 719, 585, 763], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [90, 139, 103, 153], \"label\": \"Page-header\"}, {\"bbox_2d\": [129, 134, 798, 154], \"label\": \"Section-header\"}, {\"bbox_2d\": [131, 234, 516, 291], \"label\": \"Text\"}, {\"bbox_2d\": [131, 323, 513, 469], \"label\": \"Text\"}, {\"bbox_2d\": [131, 502, 518, 625], \"label\": \"Text\"}, {\"bbox_2d\": [556, 435, 942, 469], \"label\": \"Text\"}, {\"bbox_2d\": [556, 502, 942, 581], \"label\": \"Text\"}, {\"bbox_2d\": [556, 234, 909, 268], \"label\": \"Text\"}, {\"bbox_2d\": [567, 300, 946, 403], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [91, 137, 104, 155], \"label\": \"Page-header\"}, {\"bbox_2d\": [129, 126, 799, 158], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 229, 516, 292], \"label\": \"Text\"}, {\"bbox_2d\": [555, 229, 909, 270], \"label\": \"Text\"}, {\"bbox_2d\": [130, 319, 514, 471], \"label\": \"Text\"}, {\"bbox_2d\": [130, 498, 520, 628], \"label\": \"Text\"}, {\"bbox_2d\": [555, 498, 944, 583], \"label\": \"Text\"}, {\"bbox_2d\": [555, 436, 944, 469], \"label\": \"Text\"}, {\"bbox_2d\": [564, 294, 946, 409], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [252, 68, 790, 84], \"label\": \"Title\"}, {\"bbox_2d\": [279, 102, 761, 116], \"label\": \"Text\"}, {\"bbox_2d\": [272, 118, 745, 131], \"label\": \"Text\"}, {\"bbox_2d\": [173, 133, 839, 146], \"label\": \"Text\"}, {\"bbox_2d\": [206, 159, 839, 172], \"label\": \"Text\"}, {\"bbox_2d\": [173, 174, 848, 197], \"label\": \"Text\"}, {\"bbox_2d\": [200, 200, 848, 213], \"label\": \"Text\"}, {\"bbox_2d\": [173, 226, 833, 309], \"label\": \"Text\"}, {\"bbox_2d\": [209, 336, 389, 349], \"label\": \"Section-header\"}, {\"bbox_2d\": [33, 247, 86, 716], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 374, 506, 505], \"label\": \"Text\"}, {\"bbox_2d\": [91, 509, 506, 705], \"label\": \"Text\"}, {\"bbox_2d\": [91, 709, 506, 856], \"label\": \"Text\"}, {\"bbox_2d\": [91, 861, 506, 949], \"label\": \"Text\"}, {\"bbox_2d\": [536, 924, 951, 949], \"label\": \"Text\"}, {\"bbox_2d\": [536, 727, 951, 920], \"label\": \"Text\"}, {\"bbox_2d\": [536, 530, 951, 726], \"label\": \"Text\"}, {\"bbox_2d\": [536, 336, 951, 528], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [251, 70, 790, 84], \"label\": \"Title\"}, {\"bbox_2d\": [281, 100, 761, 131], \"label\": \"Text\"}, {\"bbox_2d\": [174, 133, 868, 144], \"label\": \"Text\"}, {\"bbox_2d\": [225, 147, 817, 158], \"label\": \"Text\"}, {\"bbox_2d\": [208, 161, 834, 172], \"label\": \"Text\"}, {\"bbox_2d\": [174, 175, 868, 185], \"label\": \"Text\"}, {\"bbox_2d\": [165, 188, 877, 199], \"label\": \"Text\"}, {\"bbox_2d\": [199, 202, 843, 213], \"label\": \"Text\"}, {\"bbox_2d\": [184, 228, 858, 308], \"label\": \"Text\"}, {\"bbox_2d\": [91, 372, 506, 460], \"label\": \"Text\"}, {\"bbox_2d\": [91, 462, 506, 505], \"label\": \"Text\"}, {\"bbox_2d\": [536, 336, 951, 529], \"label\": \"Text\"}, {\"bbox_2d\": [91, 510, 506, 703], \"label\": \"Text\"}, {\"bbox_2d\": [536, 531, 952, 724], \"label\": \"Text\"}, {\"bbox_2d\": [91, 708, 506, 856], \"label\": \"Text\"}, {\"bbox_2d\": [536, 726, 952, 919], \"label\": \"Text\"}, {\"bbox_2d\": [536, 921, 951, 948], \"label\": \"Text\"}, {\"bbox_2d\": [91, 861, 506, 949], \"label\": \"Text\"}, {\"bbox_2d\": [31, 248, 61, 718], \"label\": \"Page-header\"}, {\"bbox_2d\": [210, 337, 387, 348], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [116, 149, 915, 265], \"label\": \"Section-header\"}, {\"bbox_2d\": [146, 274, 902, 296], \"label\": \"Section-header\"}, {\"bbox_2d\": [175, 448, 878, 745], \"label\": \"Picture\"}, {\"bbox_2d\": [176, 790, 878, 990], \"label\": \"Picture\"}, {\"bbox_2d\": [92, 2, 469, 25], \"label\": \"Page-header\"}, {\"bbox_2d\": [311, 46, 736, 147], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [116, 152, 924, 260], \"label\": \"Section-header\"}, {\"bbox_2d\": [143, 276, 898, 295], \"label\": \"Text\"}, {\"bbox_2d\": [699, 295, 935, 306], \"label\": \"Text\"}, {\"bbox_2d\": [311, 45, 735, 149], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 326, 930, 424], \"label\": \"Table\"}, {\"bbox_2d\": [176, 437, 876, 992], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 302], \"label\": \"Text\"}, {\"bbox_2d\": [121, 314, 911, 741], \"label\": \"Text\"}, {\"bbox_2d\": [121, 753, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 94, 911, 302], \"label\": \"Text\"}, {\"bbox_2d\": [121, 314, 911, 740], \"label\": \"Text\"}, {\"bbox_2d\": [121, 752, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [830, 74, 937, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [98, 105, 604, 120], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 137, 933, 152], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 169, 937, 235], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 244, 937, 309], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 319, 749, 334], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 334, 937, 384], \"label\": \"List-item\"}, {\"bbox_2d\": [156, 387, 937, 418], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 428, 937, 478], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 510, 937, 644], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 756, 937, 786], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 787, 937, 856], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 858, 937, 910], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 912, 937, 927], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 929, 937, 958], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 976, 191, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [900, 958, 937, 973], \"label\": \"Page-footer\"}, {\"bbox_2d\": [156, 486, 932, 500], \"label\": \"Text\"}, {\"bbox_2d\": [156, 542, 937, 625], \"label\": \"Text\"}, {\"bbox_2d\": [156, 655, 937, 736], \"label\": \"Text\"}, {\"bbox_2d\": [98, 860, 937, 907], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [98, 102, 609, 122], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 133, 937, 154], \"label\": \"Section-header\"}, {\"bbox_2d\": [98, 165, 947, 236], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 240, 949, 311], \"label\": \"List-item\"}, {\"bbox_2d\": [160, 315, 782, 335], \"label\": \"List-item\"}, {\"bbox_2d\": [160, 332, 948, 386], \"label\": \"List-item\"}, {\"bbox_2d\": [160, 383, 943, 420], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 425, 944, 479], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 507, 946, 646], \"label\": \"List-item\"}, {\"bbox_2d\": [98, 757, 941, 788], \"label\": \"Footnote\"}, {\"bbox_2d\": [98, 785, 944, 887], \"label\": \"Footnote\"}, {\"bbox_2d\": [98, 884, 941, 929], \"label\": \"Footnote\"}, {\"bbox_2d\": [98, 927, 941, 957], \"label\": \"Footnote\"}, {\"bbox_2d\": [899, 955, 945, 1001], \"label\": \"Page-footer\"}, {\"bbox_2d\": [98, 974, 195, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [161, 651, 947, 738], \"label\": \"Text\"}, {\"bbox_2d\": [161, 483, 936, 503], \"label\": \"Text\"}, {\"bbox_2d\": [830, 72, 942, 90], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [795, 69, 975, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [936, 35, 975, 67], \"label\": \"Page-header\"}, {\"bbox_2d\": [79, 190, 208, 205], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 210, 452, 220], \"label\": \"Text\"}, {\"bbox_2d\": [79, 324, 943, 346], \"label\": \"Text\"}, {\"bbox_2d\": [79, 354, 645, 364], \"label\": \"Text\"}, {\"bbox_2d\": [79, 511, 873, 522], \"label\": \"Text\"}, {\"bbox_2d\": [79, 557, 427, 568], \"label\": \"Text\"}, {\"bbox_2d\": [79, 538, 180, 553], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 577, 957, 758], \"label\": \"Table\"}, {\"bbox_2d\": [79, 825, 958, 950], \"label\": \"Table\"}, {\"bbox_2d\": [79, 773, 927, 796], \"label\": \"Text\"}, {\"bbox_2d\": [79, 805, 620, 815], \"label\": \"Text\"}, {\"bbox_2d\": [943, 994, 991, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [79, 231, 958, 315], \"label\": \"Table\"}, {\"bbox_2d\": [79, 374, 958, 500], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [795, 34, 976, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [78, 189, 208, 205], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 538, 180, 553], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 209, 451, 220], \"label\": \"Text\"}, {\"bbox_2d\": [78, 323, 947, 347], \"label\": \"Text\"}, {\"bbox_2d\": [78, 355, 644, 366], \"label\": \"Text\"}, {\"bbox_2d\": [78, 511, 876, 523], \"label\": \"Text\"}, {\"bbox_2d\": [78, 557, 427, 569], \"label\": \"Text\"}, {\"bbox_2d\": [78, 773, 930, 797], \"label\": \"Text\"}, {\"bbox_2d\": [78, 805, 620, 816], \"label\": \"Text\"}, {\"bbox_2d\": [78, 229, 957, 314], \"label\": \"Table\"}, {\"bbox_2d\": [78, 374, 957, 503], \"label\": \"Table\"}, {\"bbox_2d\": [77, 576, 958, 762], \"label\": \"Table\"}, {\"bbox_2d\": [78, 825, 958, 953], \"label\": \"Table\"}, {\"bbox_2d\": [943, 991, 991, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [1002, 206, 1018, 334], \"label\": \"Page-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 147, 317, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 318, 517, 427], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 584, 356, 601], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 609, 517, 633], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 639, 436, 652], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 800, 517, 827], \"label\": \"Section-header\"}, {\"bbox_2d\": [862, 1001, 977, 1013], \"label\": \"Page-footer\"}, {\"bbox_2d\": [565, 467, 963, 495], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 163, 517, 300], \"label\": \"Text\"}, {\"bbox_2d\": [565, 166, 963, 396], \"label\": \"Text\"}, {\"bbox_2d\": [565, 147, 963, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 429, 517, 502], \"label\": \"Text\"}, {\"bbox_2d\": [121, 507, 517, 550], \"label\": \"Text\"}, {\"bbox_2d\": [121, 657, 517, 699], \"label\": \"Text\"}, {\"bbox_2d\": [121, 704, 517, 777], \"label\": \"Text\"}, {\"bbox_2d\": [121, 832, 517, 922], \"label\": \"Text\"}, {\"bbox_2d\": [565, 499, 961, 970], \"label\": \"Picture\"}, {\"bbox_2d\": [577, 540, 802, 553], \"label\": \"Section-header\"}, {\"bbox_2d\": [587, 584, 955, 715], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 610, 889, 686], \"label\": \"Picture\"}, {\"bbox_2d\": [578, 686, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [577, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 747, 956, 852], \"label\": \"Picture\"}, {\"bbox_2d\": [578, 746, 956, 851], \"label\": \"Picture\"}, {\"bbox_2d\": [577, 747, 956, 851], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [577, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [577, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 685, 956, 718], \"label\": \"Picture\"}, {\"bbox_2d\": [576, 68", "label": "```json[{\"bbox_2d\": [120, 149, 316, 158], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 149, 963, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 319, 518, 331], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 584, 357, 597], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 605, 523, 632], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 641, 437, 651], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 801, 523, 827], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 467, 968, 494], \"label\": \"Section-header\"}, {\"bbox_2d\": [863, 1001, 977, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [120, 164, 522, 299], \"label\": \"Text\"}, {\"bbox_2d\": [120, 337, 524, 424], \"label\": \"Text\"}, {\"bbox_2d\": [120, 431, 522, 503], \"label\": \"Text\"}, {\"bbox_2d\": [120, 509, 522, 550], \"label\": \"Text\"}, {\"bbox_2d\": [120, 657, 522, 698], \"label\": \"Text\"}, {\"bbox_2d\": [120, 704, 522, 776], \"label\": \"Text\"}, {\"bbox_2d\": [120, 833, 522, 921], \"label\": \"Text\"}, {\"bbox_2d\": [565, 167, 971, 395], \"label\": \"Text\"}, {\"bbox_2d\": [565, 401, 967, 442], \"label\": \"Text\"}, {\"bbox_2d\": [569, 501, 961, 970], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [126, 18, 256, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [40, 14, 83, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [73, 99, 246, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 114, 244, 126], \"label\": \"Section-header\"}, {\"bbox_2d\": [104, 130, 390, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [114, 146, 254, 158], \"label\": \"Section-header\"}, {\"bbox_2d\": [141, 162, 894, 238], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 241, 914, 333], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 336, 908, 425], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 429, 902, 487], \"label\": \"List-item\"}, {\"bbox_2d\": [141, 490, 900, 519], \"label\": \"List-item\"}, {\"bbox_2d\": [42, 999, 170, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [136, 535, 917, 851], \"label\": \"Table\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 82, 28], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 20, 255, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 101, 246, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 117, 244, 127], \"label\": \"Section-header\"}, {\"bbox_2d\": [103, 133, 391, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [113, 148, 255, 159], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 164, 898, 237], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 242, 918, 331], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 336, 919, 425], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 430, 906, 488], \"label\": \"List-item\"}, {\"bbox_2d\": [137, 493, 904, 519], \"label\": \"List-item\"}, {\"bbox_2d\": [136, 537, 919, 852], \"label\": \"Table\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [49, 52, 325, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [49, 110, 463, 297], \"label\": \"Text\"}, {\"bbox_2d\": [49, 296, 462, 426], \"label\": \"Text\"}, {\"bbox_2d\": [49, 429, 461, 970], \"label\": \"Text\"}, {\"bbox_2d\": [498, 109, 916, 183], \"label\": \"Text\"}, {\"bbox_2d\": [498, 186, 917, 706], \"label\": \"Text\"}, {\"bbox_2d\": [498, 709, 917, 970], \"label\": \"Text\"}, {\"bbox_2d\": [465, 981, 493, 994], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [49, 51, 344, 98], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 109, 917, 183], \"label\": \"Text\"}, {\"bbox_2d\": [47, 109, 481, 296], \"label\": \"Text\"}, {\"bbox_2d\": [501, 184, 925, 708], \"label\": \"Text\"}, {\"bbox_2d\": [48, 297, 470, 427], \"label\": \"Text\"}, {\"bbox_2d\": [46, 428, 471, 970], \"label\": \"Text\"}, {\"bbox_2d\": [498, 709, 926, 970], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [197, 81, 300, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 182, 365, 204], \"label\": \"Section-header\"}, {\"bbox_2d\": [102, 288, 567, 309], \"label\": \"Section-header\"}, {\"bbox_2d\": [275, 104, 914, 169], \"label\": \"Text\"}, {\"bbox_2d\": [275, 205, 890, 270], \"label\": \"Text\"}, {\"bbox_2d\": [275, 318, 707, 334], \"label\": \"Text\"}, {\"bbox_2d\": [291, 357, 883, 404], \"label\": \"Text\"}, {\"bbox_2d\": [291, 409, 707, 425], \"label\": \"Text\"}, {\"bbox_2d\": [291, 426, 840, 443], \"label\": \"Text\"}, {\"bbox_2d\": [291, 447, 877, 478], \"label\": \"Text\"}, {\"bbox_2d\": [291, 483, 914, 547], \"label\": \"Text\"}, {\"bbox_2d\": [291, 549, 840, 565], \"label\": \"Text\"}, {\"bbox_2d\": [318, 569, 898, 616], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 618, 914, 683], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 686, 894, 733], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 736, 601, 753], \"label\": \"List-item\"}, {\"bbox_2d\": [275, 756, 556, 773], \"label\": \"List-item\"}, {\"bbox_2d\": [275, 337, 505, 354], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 776, 914, 856], \"label\": \"Text\"}, {\"bbox_2d\": [291, 879, 901, 926], \"label\": \"Text\"}, {\"bbox_2d\": [275, 858, 541, 875], \"label\": \"List-item\"}, {\"bbox_2d\": [100, 977, 379, 992], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [197, 85, 300, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [197, 187, 364, 204], \"label\": \"Section-header\"}, {\"bbox_2d\": [101, 292, 567, 309], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 859, 539, 875], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 757, 556, 773], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 740, 600, 753], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 689, 889, 734], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 620, 909, 683], \"label\": \"List-item\"}, {\"bbox_2d\": [318, 570, 896, 616], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 552, 838, 565], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 486, 911, 546], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 450, 876, 479], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 429, 839, 444], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 338, 505, 354], \"label\": \"List-item\"}, {\"bbox_2d\": [291, 360, 881, 405], \"label\": \"Text\"}, {\"bbox_2d\": [291, 411, 707, 424], \"label\": \"Text\"}, {\"bbox_2d\": [273, 321, 706, 334], \"label\": \"Text\"}, {\"bbox_2d\": [273, 210, 886, 270], \"label\": \"Text\"}, {\"bbox_2d\": [273, 108, 910, 169], \"label\": \"Text\"}, {\"bbox_2d\": [291, 779, 907, 855], \"label\": \"Text\"}, {\"bbox_2d\": [291, 881, 895, 926], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 379, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [105, 100, 458, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 195, 468, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 367, 468, 378], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 570, 395, 581], \"label\": \"Section-header\"}, {\"bbox_2d\": [111, 117, 899, 174], \"label\": \"Text\"}, {\"bbox_2d\": [138, 211, 907, 299], \"label\": \"Text\"}, {\"bbox_2d\": [138, 305, 907, 345], \"label\": \"Text\"}, {\"bbox_2d\": [138, 382, 910, 423], \"label\": \"Text\"}, {\"bbox_2d\": [138, 429, 907, 504], \"label\": \"Text\"}, {\"bbox_2d\": [138, 509, 899, 550], \"label\": \"Text\"}, {\"bbox_2d\": [138, 586, 910, 659], \"label\": \"Text\"}, {\"bbox_2d\": [138, 665, 910, 706], \"label\": \"Text\"}, {\"bbox_2d\": [43, 1003, 169, 1014], \"label\": \"Page-footer\"}, {\"bbox_2d\": [38, 16, 81, 28], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 255, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [103, 101, 462, 112], \"label\": \"Section-header\"}, {\"bbox_2d\": [113, 195, 468, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [113, 368, 469, 378], \"label\": \"Section-header\"}, {\"bbox_2d\": [113, 571, 397, 582], \"label\": \"Section-header\"}, {\"bbox_2d\": [137, 587, 915, 660], \"label\": \"Text\"}, {\"bbox_2d\": [137, 665, 916, 707], \"label\": \"Text\"}, {\"bbox_2d\": [137, 509, 905, 551], \"label\": \"Text\"}, {\"bbox_2d\": [137, 430, 914, 504], \"label\": \"Text\"}, {\"bbox_2d\": [137, 383, 916, 425], \"label\": \"Text\"}, {\"bbox_2d\": [137, 305, 915, 347], \"label\": \"Text\"}, {\"bbox_2d\": [137, 211, 914, 300], \"label\": \"Text\"}, {\"bbox_2d\": [122, 117, 904, 175], \"label\": \"Text\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [114, 97, 922, 934], \"label\": \"Table\"}, {\"bbox_2d\": [114, 938, 784, 946], \"label\": \"Text\"}, {\"bbox_2d\": [114, 947, 719, 956], \"label\": \"Text\"}, {\"bbox_2d\": [114, 957, 517, 969], \"label\": \"Text\"}, {\"bbox_2d\": [45, 998, 170, 1016], \"label\": \"Page-footer\"}, {\"bbox_2d\": [44, 15, 242, 29], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [38, 18, 354, 29], \"label\": \"Page-header\"}, {\"bbox_2d\": [43, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [113, 937, 784, 945], \"label\": \"Text\"}, {\"bbox_2d\": [146, 948, 718, 968], \"label\": \"Text\"}, {\"bbox_2d\": [109, 99, 917, 377], \"label\": \"Table\"}, {\"bbox_2d\": [111, 394, 916, 654], \"label\": \"Table\"}, {\"bbox_2d\": [112, 670, 915, 930], \"label\": \"Table\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [934, 67, 951, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 67, 951, 98], \"label\": \"Text\"}, {\"bbox_2d\": [536, 221, 951, 252], \"label\": \"Text\"}, {\"bbox_2d\": [91, 286, 546, 300], \"label\": \"Text\"}, {\"bbox_2d\": [536, 375, 951, 440], \"label\": \"Text\"}, {\"bbox_2d\": [536, 480, 951, 600], \"label\": \"Text\"}, {\"bbox_2d\": [91, 495, 506, 527], \"label\": \"Text\"}, {\"bbox_2d\": [91, 596, 506, 613], \"label\": \"Text\"}, {\"bbox_2d\": [91, 670, 506, 716], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 738, 506, 784], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 796, 506, 827], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 838, 506, 869], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 881, 506, 938], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 678, 951, 724], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 736, 951, 784], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 796, 951, 869], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 881, 951, 938], \"label\": \"List-item\"}, {\"bbox_2d\": [662, 277, 944, 384], \"label\": \"Formula\"}, {\"bbox_2d\": [609, 396, 935, 462], \"label\": \"Formula\"}, {\"bbox_2d\": [609, 483, 936, 549], \"label\": \"Formula\"}, {\"bbox_2d\": [142, 98, 567, 269], \"label\": \"Formula\"}, {\"bbox_2d\": [551, 128, 951, 200], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 69, 143, 84], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [934, 38, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 72, 951, 98], \"label\": \"Text\"}, {\"bbox_2d\": [91, 70, 148, 83], \"label\": \"Text\"}, {\"bbox_2d\": [91, 287, 489, 299], \"label\": \"Text\"}, {\"bbox_2d\": [536, 223, 951, 249], \"label\": \"Text\"}, {\"bbox_2d\": [553, 119, 951, 206], \"label\": \"Formula\"}, {\"bbox_2d\": [141, 97, 455, 273], \"label\": \"Formula\"}, {\"bbox_2d\": [553, 273, 939, 359], \"label\": \"Formula\"}, {\"bbox_2d\": [147, 313, 450, 480], \"label\": \"Formula\"}, {\"bbox_2d\": [584, 450, 904, 491], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 377, 951, 433], \"label\": \"Text\"}, {\"bbox_2d\": [91, 493, 506, 525], \"label\": \"Text\"}, {\"bbox_2d\": [536, 510, 951, 611], \"label\": \"Text\"}, {\"bbox_2d\": [91, 595, 506, 614], \"label\": \"Text\"}, {\"bbox_2d\": [91, 540, 526, 581], \"label\": \"Formula\"}, {\"bbox_2d\": [567, 670, 651, 681], \"label\": \"Text\"}, {\"bbox_2d\": [544, 684, 951, 722], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 670, 506, 708], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 711, 506, 749], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 752, 506, 776], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 780, 506, 817], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 821, 506, 859], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 862, 506, 900], \"label\": \"List-item\"}, {\"bbox_2d\": [99, 903, 506, 941], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 889, 951, 941], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 834, 951, 886], \"label\": \"List-item\"}, {\"bbox_2d\": [536, 780, 951, 831], \"label\": \"List-item\"}, {\"bbox_2d\": [544, 725, 951, 776], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [95, 105, 391, 131], \"label\": \"Section-header\"}, {\"bbox_2d\": [564, 153, 695, 170], \"label\": \"Section-header\"}, {\"bbox_2d\": [95, 153, 244, 170], \"label\": \"Section-header\"}, {\"bbox_2d\": [95, 374, 167, 391], \"label\": \"Section-header\"}, {\"bbox_2d\": [564, 661, 740, 677], \"label\": \"Section-header\"}, {\"bbox_2d\": [95, 179, 403, 190], \"label\": \"Text\"}, {\"bbox_2d\": [95, 200, 346, 245], \"label\": \"Text\"}, {\"bbox_2d\": [95, 252, 369, 281], \"label\": \"Text\"}, {\"bbox_2d\": [95, 290, 320, 300], \"label\": \"Text\"}, {\"bbox_2d\": [95, 309, 329, 319], \"label\": \"Text\"}, {\"bbox_2d\": [95, 328, 284, 342], \"label\": \"Text\"}, {\"bbox_2d\": [564, 179, 852, 210], \"label\": \"Text\"}, {\"bbox_2d\": [564, 221, 750, 249], \"label\": \"Text\"}, {\"bbox_2d\": [564, 258, 738, 268], \"label\": \"Text\"}, {\"bbox_2d\": [564, 278, 760, 290], \"label\": \"Text\"}, {\"bbox_2d\": [564, 300, 856, 310], \"label\": \"Text\"}, {\"bbox_2d\": [564, 321, 696, 331], \"label\": \"Text\"}, {\"bbox_2d\": [564, 340, 764, 350], \"label\": \"Text\"}, {\"bbox_2d\": [564, 359, 740, 369], \"label\": \"Text\"}, {\"bbox_2d\": [564, 378, 701, 388], \"label\": \"Text\"}, {\"bbox_2d\": [564, 397, 842, 407], \"label\": \"Text\"}, {\"bbox_2d\": [564, 416, 824, 426], \"label\": \"Text\"}, {\"bbox_2d\": [564, 435, 774, 445], \"label\": \"Text\"}, {\"bbox_2d\": [564, 454, 719, 464], \"label\": \"Text\"}, {\"bbox_2d\": [564, 474, 766, 484], \"label\": \"Text\"}, {\"bbox_2d\": [564, 493, 750, 503], \"label\": \"Text\"}, {\"bbox_2d\": [564, 512, 848, 522], \"label\": \"Text\"}, {\"bbox_2d\": [564, 531, 809, 541], \"label\": \"Text\"}, {\"bbox_2d\": [564, 550, 809, 578], \"label\": \"Text\"}, {\"bbox_2d\": [564, 587, 738, 597], \"label\": \"Text\"}, {\"bbox_2d\": [564, 606, 945, 627], \"label\": \"Text\"}, {\"bbox_2d\": [564, 636, 945, 656], \"label\": \"Text\"}, {\"bbox_2d\": [564, 685, 754, 695], \"label\": \"Text\"}, {\"bbox_2d\": [564, 705, 771, 715], \"label\": \"Text\"}, {\"bbox_2d\": [564, 724, 824, 734], \"label\": \"Text\"}, {\"bbox_2d\": [564, 743, 856, 753], \"label\": \"Text\"}, {\"bbox_2d\": [564, 763, 864, 773], \"label\": \"Text\"}, {\"bbox_2d\": [564, 782, 879, 792], \"label\": \"Text\"}, {\"bbox_2d\": [564, 801, 899, 811], \"label\": \"Text\"}, {\"bbox_2d\": [564, 821, 901, 831], \"label\": \"Text\"}, {\"bbox_2d\": [95, 399, 412, 409], \"label\": \"Text\"}, {\"bbox_2d\": [95, 418, 385, 428], \"label\": \"Text\"}, {\"bbox_2d\": [95, 437, 320, 447], \"label\": \"Text\"}, {\"bbox_2d\": [95, 456, 300, 466], \"label\": \"Text\"}, {\"bbox_2d\": [95, 475, 446, 485], \"label\": \"Text\"}, {\"bbox_2d\": [95, 494, 458, 504], \"label\": \"Text\"}, {\"bbox_2d\": [95, 513, 488, 523], \"label\": \"Text\"}, {\"bbox_2d\": [95, 532, 407, 542], \"label\": \"Text\"}, {\"bbox_2d\": [95, 551, 407, 561], \"label\": \"Text\"}, {\"bbox_2d\": [95, 570, 440, 580], \"label\": \"Text\"}, {\"bbox_2d\": [95, 589, 434, 599], \"label\": \"Text\"}, {\"bbox_2d\": [95, 608, 361, 618], \"label\": \"Text\"}, {\"bbox_2d\": [95, 627, 425, 637], \"label\": \"Text\"}, {\"bbox_2d\": [95, 646, 425, 656], \"label\": \"Text\"}, {\"bbox_2d\": [95, 665, 343, 675], \"label\": \"Text\"}, {\"bbox_2d\": [95, 684, 237, 694], \"label\": \"Text\"}, {\"bbox_2d\": [95, 703, 284, 713], \"label\": \"Text\"}, {\"bbox_2d\": [95, 722, 286, 732], \"label\": \"Text\"}, {\"bbox_2d\": [95, 741, 286, 751], \"label\": \"Text\"}, {\"bbox_2d\": [95, 760, 505, 770], \"label\": \"Text\"}, {\"bbox_2d\": [95, 779, 249, 789], \"label\": \"Text\"}, {\"bbox_2d\": [95, 798, 261, 808], \"label\": \"Text\"}, {\"bbox_2d\": [95, 817, 289, 827], \"label\": \"Text\"}, {\"bbox_2d\": [95, 836, 367, 846], \"label\": \"Text\"}, {\"bbox_2d\": [95, 855, 293, 8", "label": "```json[{\"bbox_2d\": [563, 149, 697, 172], \"label\": \"Section-header\"}, {\"bbox_2d\": [95, 150, 244, 172], \"label\": \"Section-header\"}, {\"bbox_2d\": [95, 100, 393, 132], \"label\": \"Section-header\"}, {\"bbox_2d\": [95, 370, 168, 392], \"label\": \"Section-header\"}, {\"bbox_2d\": [563, 656, 741, 679], \"label\": \"Section-header\"}, {\"bbox_2d\": [563, 683, 901, 838], \"label\": \"Text\"}, {\"bbox_2d\": [563, 176, 946, 628], \"label\": \"Text\"}, {\"bbox_2d\": [95, 397, 507, 952], \"label\": \"Text\"}, {\"bbox_2d\": [95, 177, 458, 342], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [80, 47, 253, 78], \"label\": \"Page-header\"}, {\"bbox_2d\": [203, 87, 858, 973], \"label\": \"Table\"}, {\"bbox_2d\": [501, 981, 521, 997], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [80, 49, 265, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [231, 95, 865, 220], \"label\": \"Table\"}, {\"bbox_2d\": [230, 285, 865, 357], \"label\": \"Table\"}, {\"bbox_2d\": [230, 384, 865, 747], \"label\": \"Table\"}, {\"bbox_2d\": [227, 775, 863, 969], \"label\": \"Table\"}, {\"bbox_2d\": [502, 977, 523, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [200, 755, 597, 770], \"label\": \"Section-header\"}, {\"bbox_2d\": [200, 366, 620, 381], \"label\": \"Section-header\"}, {\"bbox_2d\": [167, 237, 665, 261], \"label\": \"Section-header\"}, {\"bbox_2d\": [200, 270, 311, 285], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [81, 110, 500, 390], \"label\": \"Text\"}, {\"bbox_2d\": [81, 441, 491, 656], \"label\": \"Text\"}, {\"bbox_2d\": [81, 668, 499, 869], \"label\": \"Text\"}, {\"bbox_2d\": [81, 879, 499, 940], \"label\": \"Text\"}, {\"bbox_2d\": [533, 825, 948, 946], \"label\": \"Text\"}, {\"bbox_2d\": [533, 645, 945, 814], \"label\": \"Text\"}, {\"bbox_2d\": [533, 464, 936, 633], \"label\": \"Text\"}, {\"bbox_2d\": [533, 347, 949, 451], \"label\": \"Text\"}, {\"bbox_2d\": [533, 169, 945, 333], \"label\": \"Text\"}, {\"bbox_2d\": [533, 110, 945, 155], \"label\": \"Text\"}, {\"bbox_2d\": [81, 407, 402, 437], \"label\": \"Section-header\"}, {\"bbox_2d\": [818, 1005, 954, 1014], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [81, 112, 502, 389], \"label\": \"Text\"}, {\"bbox_2d\": [81, 441, 496, 624], \"label\": \"Text\"}, {\"bbox_2d\": [81, 630, 487, 656], \"label\": \"Text\"}, {\"bbox_2d\": [81, 669, 502, 868], \"label\": \"Text\"}, {\"bbox_2d\": [81, 881, 502, 938], \"label\": \"Text\"}, {\"bbox_2d\": [533, 826, 949, 946], \"label\": \"Text\"}, {\"bbox_2d\": [533, 645, 948, 813], \"label\": \"Text\"}, {\"bbox_2d\": [533, 465, 951, 632], \"label\": \"Text\"}, {\"bbox_2d\": [533, 347, 953, 452], \"label\": \"Text\"}, {\"bbox_2d\": [533, 167, 950, 334], \"label\": \"Text\"}, {\"bbox_2d\": [533, 112, 952, 153], \"label\": \"Text\"}, {\"bbox_2d\": [81, 408, 408, 437], \"label\": \"Section-header\"}, {\"bbox_2d\": [822, 1005, 955, 1013], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 93, 584, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 820, 480, 836], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 854, 926, 882], \"label\": \"Text\"}, {\"bbox_2d\": [241, 729, 852, 758], \"label\": \"Text\"}, {\"bbox_2d\": [241, 773, 712, 787], \"label\": \"Text\"}, {\"bbox_2d\": [231, 500, 884, 703], \"label\": \"Text\"}, {\"bbox_2d\": [231, 315, 739, 486], \"label\": \"Text\"}, {\"bbox_2d\": [231, 293, 821, 303], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 127, 926, 202], \"label\": \"Text\"}, {\"bbox_2d\": [231, 219, 926, 278], \"label\": \"Text\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 93, 584, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 127, 926, 202], \"label\": \"Text\"}, {\"bbox_2d\": [231, 219, 926, 278], \"label\": \"Text\"}, {\"bbox_2d\": [241, 730, 858, 758], \"label\": \"Text\"}, {\"bbox_2d\": [241, 775, 712, 787], \"label\": \"Text\"}, {\"bbox_2d\": [231, 854, 926, 882], \"label\": \"Text\"}, {\"bbox_2d\": [109, 820, 481, 836], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 292, 821, 303], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 314, 882, 703], \"label\": \"Text\"}, {\"bbox_2d\": [619, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 568, 105], \"label\": \"Text\"}, {\"bbox_2d\": [228, 124, 864, 369], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 374, 511, 384], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 412, 406, 426], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 446, 673, 457], \"label\": \"Text\"}, {\"bbox_2d\": [231, 511, 766, 523], \"label\": \"Text\"}, {\"bbox_2d\": [231, 610, 835, 638], \"label\": \"Text\"}, {\"bbox_2d\": [231, 724, 859, 736], \"label\": \"Text\"}, {\"bbox_2d\": [231, 747, 593, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 769, 912, 779], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 790, 907, 818], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 829, 412, 840], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 851, 922, 878], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 648, 442, 658], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 660, 822, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 679, 564, 690], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 695, 689, 706], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 535, 442, 545], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 549, 826, 560], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 564, 564, 575], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 580, 689, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [691, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 569, 105], \"label\": \"Text\"}, {\"bbox_2d\": [231, 124, 862, 369], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 373, 510, 384], \"label\": \"Caption\"}, {\"bbox_2d\": [109, 411, 406, 426], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 445, 672, 457], \"label\": \"Text\"}, {\"bbox_2d\": [231, 512, 766, 524], \"label\": \"Text\"}, {\"bbox_2d\": [231, 610, 840, 637], \"label\": \"Text\"}, {\"bbox_2d\": [231, 723, 860, 735], \"label\": \"Text\"}, {\"bbox_2d\": [231, 850, 926, 878], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 828, 412, 840], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 790, 913, 818], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 768, 913, 780], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 746, 594, 758], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 695, 689, 707], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 679, 563, 691], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 663, 822, 675], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 648, 442, 660], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 581, 689, 593], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 565, 563, 577], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 549, 825, 561], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 534, 442, 546], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 483, 620, 495], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 467, 836, 479], \"label\": \"List-item\"}, {\"bbox_2d\": [691, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 122, 671, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 166, 894, 194], \"label\": \"Text\"}, {\"bbox_2d\": [292, 212, 903, 240], \"label\": \"Text\"}, {\"bbox_2d\": [292, 249, 741, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 268, 718, 280], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 287, 900, 315], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 323, 535, 335], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 342, 549, 354], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 362, 726, 374], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 382, 684, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [292, 412, 931, 472], \"label\": \"Text\"}, {\"bbox_2d\": [292, 490, 914, 536], \"label\": \"Text\"}, {\"bbox_2d\": [292, 865, 930, 926], \"label\": \"Text\"}, {\"bbox_2d\": [316, 548, 789, 818], \"label\": \"Text\"}, {\"bbox_2d\": [914, 979, 931, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 672, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 166, 895, 195], \"label\": \"Text\"}, {\"bbox_2d\": [293, 213, 903, 242], \"label\": \"Text\"}, {\"bbox_2d\": [293, 246, 740, 261], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 265, 717, 281], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 285, 900, 316], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 320, 535, 336], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 340, 548, 355], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 359, 726, 375], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 379, 683, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 412, 931, 473], \"label\": \"List-item\"}, {\"bbox_2d\": [293, 489, 915, 535], \"label\": \"Text\"}, {\"bbox_2d\": [479, 546, 682, 558], \"label\": \"Section-header\"}, {\"bbox_2d\": [315, 569, 790, 629], \"label\": \"Table\"}, {\"bbox_2d\": [317, 781, 716, 817], \"label\": \"Text\"}, {\"bbox_2d\": [293, 865, 929, 926], \"label\": \"Text\"}, {\"bbox_2d\": [913, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [83, 761, 500, 835], \"label\": \"Text\"}, {\"bbox_2d\": [544, 761, 957, 848], \"label\": \"Text\"}, {\"bbox_2d\": [83, 844, 505, 946], \"label\": \"Text\"}, {\"bbox_2d\": [544, 860, 952, 931], \"label\": \"Text\"}, {\"bbox_2d\": [544, 942, 967, 981], \"label\": \"Text\"}, {\"bbox_2d\": [83, 957, 502, 983], \"label\": \"Text\"}, {\"bbox_2d\": [648, 999, 976, 1007], \"label\": \"Page-footer\"}, {\"bbox_2d\": [219, 357, 981, 656], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [543, 762, 961, 848], \"label\": \"Text\"}, {\"bbox_2d\": [543, 859, 955, 930], \"label\": \"Text\"}, {\"bbox_2d\": [543, 941, 971, 981], \"label\": \"Text\"}, {\"bbox_2d\": [83, 762, 503, 833], \"label\": \"Text\"}, {\"bbox_2d\": [83, 844, 511, 945], \"label\": \"Text\"}, {\"bbox_2d\": [83, 956, 505, 981], \"label\": \"Text\"}, {\"bbox_2d\": [648, 1000, 975, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [0, 210, 1034, 725], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [121, 208, 911, 306], \"label\": \"Text\"}, {\"bbox_2d\": [121, 377, 911, 556], \"label\": \"Text\"}, {\"bbox_2d\": [121, 627, 658, 643], \"label\": \"Text\"}, {\"bbox_2d\": [121, 714, 911, 756], \"label\": \"Text\"}, {\"bbox_2d\": [121, 827, 911, 870], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [174, 887, 911, 963], \"label\": \"Formula\"}, {\"bbox_2d\": [372, 770, 911, 839], \"label\": \"Formula\"}, {\"bbox_2d\": [263, 664, 911, 694], \"label\": \"Formula\"}, {\"bbox_2d\": [400, 582, 911, 600], \"label\": \"Formula\"}, {\"bbox_2d\": [397, 330, 911, 361], \"label\": \"Formula\"}, {\"bbox_2d\": [404, 164, 911, 182], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 138], \"label\": \"Text\"}, {\"bbox_2d\": [121, 208, 911, 307], \"label\": \"Text\"}, {\"bbox_2d\": [121, 376, 911, 556], \"label\": \"Text\"}, {\"bbox_2d\": [121, 627, 658, 642], \"label\": \"Text\"}, {\"bbox_2d\": [121, 713, 911, 756], \"label\": \"Text\"}, {\"bbox_2d\": [121, 826, 911, 869], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [402, 165, 911, 192], \"label\": \"Formula\"}, {\"bbox_2d\": [398, 333, 911, 360], \"label\": \"Formula\"}, {\"bbox_2d\": [399, 584, 911, 610], \"label\": \"Formula\"}, {\"bbox_2d\": [266, 668, 911, 715], \"label\": \"Formula\"}, {\"bbox_2d\": [334, 771, 911, 819], \"label\": \"Formula\"}, {\"bbox_2d\": [174, 886, 911, 978], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [62, 51, 264, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [60, 94, 471, 146], \"label\": \"Text\"}, {\"bbox_2d\": [60, 151, 471, 388], \"label\": \"Text\"}, {\"bbox_2d\": [60, 393, 470, 575], \"label\": \"Text\"}, {\"bbox_2d\": [508, 95, 921, 277], \"label\": \"Text\"}, {\"bbox_2d\": [508, 282, 920, 520], \"label\": \"Text\"}, {\"bbox_2d\": [508, 525, 921, 576], \"label\": \"Text\"}, {\"bbox_2d\": [60, 581, 470, 781], \"label\": \"Text\"}, {\"bbox_2d\": [508, 655, 920, 801], \"label\": \"Text\"}, {\"bbox_2d\": [508, 857, 921, 966], \"label\": \"Text\"}, {\"bbox_2d\": [60, 787, 470, 962], \"label\": \"Text\"}, {\"bbox_2d\": [475, 979, 499, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [507, 826, 760, 840], \"label\": \"Section-header\"}, {\"bbox_2d\": [546, 579, 774, 593], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 599, 826, 612], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 618, 764, 632], \"label\": \"List-item\"}, {\"bbox_2d\": [546, 638, 755, 651], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [62, 51, 278, 82], \"label\": \"Page-header\"}, {\"bbox_2d\": [508, 92, 928, 278], \"label\": \"Text\"}, {\"bbox_2d\": [59, 93, 476, 148], \"label\": \"Text\"}, {\"bbox_2d\": [59, 148, 483, 390], \"label\": \"Text\"}, {\"bbox_2d\": [508, 278, 930, 520], \"label\": \"Text\"}, {\"bbox_2d\": [59, 391, 483, 576], \"label\": \"Text\"}, {\"bbox_2d\": [508, 521, 928, 576], \"label\": \"Text\"}, {\"bbox_2d\": [58, 577, 480, 783], \"label\": \"Text\"}, {\"bbox_2d\": [507, 652, 931, 800], \"label\": \"Text\"}, {\"bbox_2d\": [547, 577, 782, 595], \"label\": \"List-item\"}, {\"bbox_2d\": [548, 596, 840, 614], \"label\": \"List-item\"}, {\"bbox_2d\": [547, 614, 771, 632], \"label\": \"List-item\"}, {\"bbox_2d\": [547, 633, 761, 650], \"label\": \"List-item\"}, {\"bbox_2d\": [507, 853, 930, 964], \"label\": \"Text\"}, {\"bbox_2d\": [57, 786, 485, 964], \"label\": \"Text\"}, {\"bbox_2d\": [507, 824, 762, 841], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [90, 874, 969, 927], \"label\": \"Caption\"}, {\"bbox_2d\": [933, 942, 942, 950], \"label\": \"Page-footer\"}, {\"bbox_2d\": [308, 145, 759, 465], \"label\": \"Picture\"}, {\"bbox_2d\": [305, 477, 733, 839], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [92, 874, 968, 924], \"label\": \"Caption\"}, {\"bbox_2d\": [985, 943, 998, 951], \"label\": \"Page-footer\"}, {\"bbox_2d\": [307, 143, 764, 836], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [119, 575, 534, 966], \"label\": \"Picture\"}, {\"bbox_2d\": [556, 574, 970, 965], \"label\": \"Picture\"}, {\"bbox_2d\": [863, 997, 977, 1012], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 147, 963, 559], \"label\": \"Table\"}, {\"bbox_2d\": [132, 587, 510, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [565, 587, 922, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 625, 528, 801], \"label\": \"Text\"}, {\"bbox_2d\": [121, 800, 526, 852], \"label\": \"Text\"}, {\"bbox_2d\": [121, 850, 348, 927], \"label\": \"Text\"}, {\"bbox_2d\": [565, 625, 964, 769], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [120, 147, 966, 560], \"label\": \"Table\"}, {\"bbox_2d\": [566, 588, 926, 614], \"label\": \"Section-header\"}, {\"bbox_2d\": [132, 588, 514, 614], \"label\": \"Section-header\"}, {\"bbox_2d\": [132, 628, 527, 677], \"label\": \"Text\"}, {\"bbox_2d\": [132, 680, 529, 795], \"label\": \"Text\"}, {\"bbox_2d\": [132, 798, 527, 847], \"label\": \"Text\"}, {\"bbox_2d\": [132, 850, 352, 925], \"label\": \"Text\"}, {\"bbox_2d\": [566, 628, 962, 768], \"label\": \"Text\"}, {\"bbox_2d\": [715, 835, 960, 954], \"label\": \"Picture\"}, {\"bbox_2d\": [358, 863, 524, 952], \"label\": \"Picture\"}, {\"bbox_2d\": [949, 1001, 977, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [863, 1003, 937, 1013], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 94, 671, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 331, 268, 347], \"label\": \"Text\"}, {\"bbox_2d\": [121, 446, 386, 462], \"label\": \"Text\"}, {\"bbox_2d\": [121, 688, 911, 866], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [121, 641, 609, 656], \"label\": \"Section-header\"}, {\"bbox_2d\": [272, 475, 911, 522], \"label\": \"Formula\"}, {\"bbox_2d\": [272, 537, 911, 606], \"label\": \"Formula\"}, {\"bbox_2d\": [272, 882, 911, 946], \"label\": \"Formula\"}, {\"bbox_2d\": [272, 921, 911, 994], \"label\": \"Formula\"}, {\"bbox_2d\": [380, 372, 911, 419], \"label\": \"Formula\"}, {\"bbox_2d\": [184, 127, 911, 255], \"label\": \"Formula\"}, {\"bbox_2d\": [184, 261, 911, 321], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 646, 111], \"label\": \"Text\"}, {\"bbox_2d\": [121, 331, 294, 347], \"label\": \"Text\"}, {\"bbox_2d\": [121, 447, 387, 463], \"label\": \"Text\"}, {\"bbox_2d\": [121, 686, 911, 866], \"label\": \"Text\"}, {\"bbox_2d\": [266, 932, 911, 967], \"label\": \"Formula\"}, {\"bbox_2d\": [272, 878, 911, 935], \"label\": \"Formula\"}, {\"bbox_2d\": [273, 464, 911, 550], \"label\": \"Formula\"}, {\"bbox_2d\": [273, 529, 911, 623], \"label\": \"Formula\"}, {\"bbox_2d\": [384, 369, 911, 392], \"label\": \"Formula\"}, {\"bbox_2d\": [381, 400, 911, 432], \"label\": \"Formula\"}, {\"bbox_2d\": [185, 258, 911, 316], \"label\": \"Formula\"}, {\"bbox_2d\": [185, 178, 911, 273], \"label\": \"Formula\"}, {\"bbox_2d\": [185, 125, 911, 183], \"label\": \"Formula\"}, {\"bbox_2d\": [147, 642, 721, 655], \"label\": \"Section-header\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [758, 71, 962, 99], \"label\": \"Page-header\"}, {\"bbox_2d\": [552, 114, 962, 225], \"label\": \"Text\"}, {\"bbox_2d\": [97, 114, 511, 242], \"label\": \"Text\"}, {\"bbox_2d\": [97, 246, 511, 867], \"label\": \"Text\"}, {\"bbox_2d\": [97, 871, 511, 988], \"label\": \"Text\"}, {\"bbox_2d\": [552, 277, 962, 498], \"label\": \"Text\"}, {\"bbox_2d\": [552, 502, 962, 896], \"label\": \"Text\"}, {\"bbox_2d\": [552, 898, 962, 988], \"label\": \"Text\"}, {\"bbox_2d\": [521, 1000, 547, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [552, 250, 769, 264], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [759, 69, 977, 100], \"label\": \"Page-header\"}, {\"bbox_2d\": [551, 246, 770, 264], \"label\": \"Section-header\"}, {\"bbox_2d\": [519, 994, 551, 1009], \"label\": \"Page-footer\"}, {\"bbox_2d\": [97, 111, 524, 241], \"label\": \"Text\"}, {\"bbox_2d\": [97, 243, 527, 865], \"label\": \"Text\"}, {\"bbox_2d\": [99, 866, 520, 984], \"label\": \"Text\"}, {\"bbox_2d\": [553, 891, 971, 984], \"label\": \"Text\"}, {\"bbox_2d\": [551, 273, 977, 496], \"label\": \"Text\"}, {\"bbox_2d\": [552, 497, 988, 890], \"label\": \"Text\"}, {\"bbox_2d\": [551, 112, 977, 223], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [72, 33, 379, 98], \"label\": \"Picture\"}, {\"bbox_2d\": [72, 148, 470, 284], \"label\": \"Text\"}, {\"bbox_2d\": [517, 149, 916, 643], \"label\": \"Text\"}, {\"bbox_2d\": [72, 571, 470, 708], \"label\": \"Text\"}, {\"bbox_2d\": [72, 713, 470, 959], \"label\": \"Text\"}, {\"bbox_2d\": [72, 539, 367, 567], \"label\": \"Section-header\"}, {\"bbox_2d\": [72, 304, 406, 332], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 337, 456, 362], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 367, 456, 392], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 397, 456, 442], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 447, 456, 489], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 494, 456, 519], \"label\": \"List-item\"}, {\"bbox_2d\": [699, 768, 915, 788], \"label\": \"Caption\"}, {\"bbox_2d\": [517, 929, 701, 938], \"label\": \"Caption\"}, {\"bbox_2d\": [61, 999, 170, 1015], \"label\": \"Page-footer\"}, {\"bbox_2d\": [699, 658, 916, 785], \"label\": \"Picture\"}, {\"bbox_2d\": [517, 819, 734, 926], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [146, 60, 378, 78], \"label\": \"Page-header\"}, {\"bbox_2d\": [72, 149, 476, 284], \"label\": \"Text\"}, {\"bbox_2d\": [517, 149, 921, 644], \"label\": \"Text\"}, {\"bbox_2d\": [72, 572, 476, 707], \"label\": \"Text\"}, {\"bbox_2d\": [72, 713, 475, 958], \"label\": \"Text\"}, {\"bbox_2d\": [58, 1001, 170, 1018], \"label\": \"Page-footer\"}, {\"bbox_2d\": [516, 821, 735, 926], \"label\": \"Picture\"}, {\"bbox_2d\": [698, 659, 916, 766], \"label\": \"Picture\"}, {\"bbox_2d\": [699, 770, 919, 786], \"label\": \"Caption\"}, {\"bbox_2d\": [517, 930, 701, 937], \"label\": \"Caption\"}, {\"bbox_2d\": [72, 539, 373, 566], \"label\": \"Section-header\"}, {\"bbox_2d\": [85, 494, 460, 519], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 447, 460, 488], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 400, 460, 441], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 369, 456, 394], \"label\": \"List-item\"}, {\"bbox_2d\": [85, 337, 460, 362], \"label\": \"List-item\"}, {\"bbox_2d\": [72, 304, 412, 331], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [199, 69, 934, 101], \"label\": \"Section-header\"}, {\"bbox_2d\": [175, 127, 930, 957], \"label\": \"Table\"}, {\"bbox_2d\": [820, 938, 935, 992], \"label\": \"Text\"}, {\"bbox_2d\": [175, 842, 238, 882], \"label\": \"Text\"}, {\"bbox_2d\": [175, 885, 238, 924], \"label\": \"Text\"}, {\"bbox_2d\": [175, 927, 238, 966], \"label\": \"Text\"}, {\"bbox_2d\": [118, 949, 127, 962], \"label\": \"Page-header\"}, {\"bbox_2d\": [118, 960, 127, 968], \"label\": \"Page-header\"}, {\"bbox_2d\": [39, 959, 147, 1019], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [198, 69, 937, 100], \"label\": \"Page-header\"}, {\"bbox_2d\": [199, 688, 216, 956], \"label\": \"Section-header\"}, {\"bbox_2d\": [236, 765, 253, 956], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 130, 290, 928], \"label\": \"Text\"}, {\"bbox_2d\": [954, 618, 971, 955], \"label\": \"Text\"}, {\"bbox_2d\": [307, 119, 931, 956], \"label\": \"Table\"}, {\"bbox_2d\": [113, 948, 133, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [41, 960, 148, 1020], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 34, 409, 88], \"label\": \"Picture\"}, {\"bbox_2d\": [124, 138, 561, 158], \"label\": \"Section-header\"}, {\"bbox_2d\": [154, 178, 752, 197], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 215, 912, 245], \"label\": \"Text\"}, {\"bbox_2d\": [121, 264, 912, 309], \"label\": \"Text\"}, {\"bbox_2d\": [121, 385, 891, 401], \"label\": \"Text\"}, {\"bbox_2d\": [121, 419, 912, 449], \"label\": \"Text\"}, {\"bbox_2d\": [121, 650, 912, 696], \"label\": \"Text\"}, {\"bbox_2d\": [121, 713, 586, 729], \"label\": \"Text\"}, {\"bbox_2d\": [121, 744, 912, 775], \"label\": \"Text\"}, {\"bbox_2d\": [121, 792, 736, 808], \"label\": \"Text\"}, {\"bbox_2d\": [121, 824, 912, 855], \"label\": \"Text\"}, {\"bbox_2d\": [121, 872, 912, 932], \"label\": \"Text\"}, {\"bbox_2d\": [154, 465, 320, 480], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 618, 315, 634], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 498, 888, 520], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 523, 875, 571], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 573, 912, 603], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 332, 912, 368], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 875, 912, 932], \"label\": \"Text\"}, {\"bbox_2d\": [533, 968, 552, 983], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [123, 136, 568, 157], \"label\": \"Section-header\"}, {\"bbox_2d\": [154, 179, 760, 197], \"label\": \"Section-header\"}, {\"bbox_2d\": [148, 332, 919, 368], \"label\": \"Section-header\"}, {\"bbox_2d\": [123, 215, 918, 244], \"label\": \"Text\"}, {\"bbox_2d\": [123, 263, 918, 308], \"label\": \"Text\"}, {\"bbox_2d\": [123, 386, 892, 400], \"label\": \"Text\"}, {\"bbox_2d\": [123, 418, 918, 448], \"label\": \"Text\"}, {\"bbox_2d\": [123, 651, 918, 696], \"label\": \"Text\"}, {\"bbox_2d\": [123, 713, 594, 728], \"label\": \"Text\"}, {\"bbox_2d\": [123, 746, 918, 775], \"label\": \"Text\"}, {\"bbox_2d\": [123, 825, 918, 855], \"label\": \"Text\"}, {\"bbox_2d\": [123, 793, 738, 808], \"label\": \"Text\"}, {\"bbox_2d\": [123, 872, 918, 932], \"label\": \"Text\"}, {\"bbox_2d\": [533, 970, 552, 983], \"label\": \"Page-footer\"}, {\"bbox_2d\": [154, 465, 326, 480], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 497, 894, 572], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 572, 918, 602], \"label\": \"List-item\"}, {\"bbox_2d\": [154, 619, 321, 634], \"label\": \"List-item\"}, {\"bbox_2d\": [123, 34, 410, 87], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 917, 122], \"label\": \"Text\"}, {\"bbox_2d\": [121, 139, 917, 177], \"label\": \"Text\"}, {\"bbox_2d\": [121, 194, 287, 204], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 400, 195, 410], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 501, 421, 511], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 204, 903, 386], \"label\": \"Table\"}, {\"bbox_2d\": [121, 414, 917, 484], \"label\": \"Text\"}, {\"bbox_2d\": [121, 512, 905, 847], \"label\": \"Table\"}, {\"bbox_2d\": [496, 941, 539, 954], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 917, 123], \"label\": \"Text\"}, {\"bbox_2d\": [121, 135, 917, 179], \"label\": \"Text\"}, {\"bbox_2d\": [121, 412, 918, 484], \"label\": \"Text\"}, {\"bbox_2d\": [121, 191, 285, 206], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 397, 196, 412], \"label\": \"Section-header\"}, {\"bbox_2d\": [135, 206, 899, 383], \"label\": \"Table\"}, {\"bbox_2d\": [135, 512, 906, 847], \"label\": \"Table\"}, {\"bbox_2d\": [121, 497, 423, 512], \"label\": \"Section-header\"}, {\"bbox_2d\": [498, 937, 541, 955], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [464, 54, 595, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [309, 54, 319, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 85, 506, 182], \"label\": \"Text\"}, {\"bbox_2d\": [123, 187, 506, 924], \"label\": \"Text\"}, {\"bbox_2d\": [555, 87, 937, 475], \"label\": \"Text\"}, {\"bbox_2d\": [555, 479, 937, 722], \"label\": \"Text\"}, {\"bbox_2d\": [555, 726, 937, 924], \"label\": \"Text\"}, {\"bbox_2d\": [526, 988, 533, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [309, 55, 320, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [463, 54, 595, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [740, 55, 752, 68], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 84, 510, 184], \"label\": \"Text\"}, {\"bbox_2d\": [123, 186, 523, 504], \"label\": \"Text\"}, {\"bbox_2d\": [123, 505, 523, 925], \"label\": \"Text\"}, {\"bbox_2d\": [554, 84, 954, 475], \"label\": \"Text\"}, {\"bbox_2d\": [554, 476, 949, 722], \"label\": \"Text\"}, {\"bbox_2d\": [554, 723, 949, 925], \"label\": \"Text\"}, {\"bbox_2d\": [525, 989, 533, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [240, 93, 794, 878], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 948, 914, 1019], \"label\": \"Caption\"}, {\"bbox_2d\": [121, 1031, 151, 1039], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [245, 90, 791, 482], \"label\": \"Picture\"}, {\"bbox_2d\": [237, 490, 792, 879], \"label\": \"Picture\"}, {\"bbox_2d\": [121, 920, 911, 1020], \"label\": \"Caption\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 98, 594, 111], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 133, 453, 144], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 309, 277, 320], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 325, 913, 448], \"label\": \"Text\"}, {\"bbox_2d\": [121, 164, 913, 192], \"label\": \"Text\"}, {\"bbox_2d\": [181, 196, 543, 288], \"label\": \"Table\"}, {\"bbox_2d\": [121, 699, 880, 711], \"label\": \"Text\"}, {\"bbox_2d\": [121, 731, 516, 743], \"label\": \"Section-header\"}, {\"bbox_2d\": [134, 750, 542, 836], \"label\": \"Table\"}, {\"bbox_2d\": [548, 465, 918, 679], \"label\": \"Table\"}, {\"bbox_2d\": [121, 758, 914, 836], \"label\": \"Table\"}, {\"bbox_2d\": [498, 957, 537, 969], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 93, 599, 113], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 129, 452, 145], \"label\": \"Section-header\"}, {\"bbox_2d\": [181, 192, 547, 289], \"label\": \"Table\"}, {\"bbox_2d\": [121, 161, 917, 193], \"label\": \"Text\"}, {\"bbox_2d\": [121, 320, 921, 449], \"label\": \"Text\"}, {\"bbox_2d\": [121, 304, 277, 321], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 695, 885, 712], \"label\": \"Text\"}, {\"bbox_2d\": [200, 464, 919, 680], \"label\": \"Table\"}, {\"bbox_2d\": [135, 742, 914, 835], \"label\": \"Table\"}, {\"bbox_2d\": [121, 727, 515, 744], \"label\": \"Text\"}, {\"bbox_2d\": [498, 953, 541, 971], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [357, 106, 911, 155], \"label\": \"Formula\"}, {\"bbox_2d\": [125, 169, 911, 309], \"label\": \"Text\"}, {\"bbox_2d\": [125, 317, 911, 360], \"label\": \"Text\"}, {\"bbox_2d\": [125, 421, 340, 436], \"label\": \"Text\"}, {\"bbox_2d\": [125, 526, 277, 541], \"label\": \"Text\"}, {\"bbox_2d\": [125, 554, 250, 569], \"label\": \"Text\"}, {\"bbox_2d\": [125, 679, 669, 695], \"label\": \"Text\"}, {\"bbox_2d\": [125, 766, 197, 781], \"label\": \"Text\"}, {\"bbox_2d\": [508, 974, 527, 989], \"label\": \"Page-footer\"}, {\"bbox_2d\": [335, 700, 701, 746], \"label\": \"Formula\"}, {\"bbox_2d\": [301, 459, 911, 522], \"label\": \"Formula\"}, {\"bbox_2d\": [341, 360, 911, 409], \"label\": \"Formula\"}, {\"bbox_2d\": [236, 599, 742, 662], \"label\": \"Formula\"}, {\"bbox_2d\": [326, 824, 705, 932], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [360, 108, 911, 159], \"label\": \"Formula\"}, {\"bbox_2d\": [125, 168, 911, 306], \"label\": \"Text\"}, {\"bbox_2d\": [125, 319, 911, 355], \"label\": \"Text\"}, {\"bbox_2d\": [125, 422, 421, 434], \"label\": \"Text\"}, {\"bbox_2d\": [125, 528, 273, 549], \"label\": \"Text\"}, {\"bbox_2d\": [151, 553, 251, 565], \"label\": \"Text\"}, {\"bbox_2d\": [125, 681, 668, 703], \"label\": \"Text\"}, {\"bbox_2d\": [151, 767, 195, 780], \"label\": \"Text\"}, {\"bbox_2d\": [456, 362, 911, 413], \"label\": \"Formula\"}, {\"bbox_2d\": [317, 447, 911, 522], \"label\": \"Formula\"}, {\"bbox_2d\": [233, 599, 768, 675], \"label\": \"Formula\"}, {\"bbox_2d\": [335, 727, 700, 745], \"label\": \"Formula\"}, {\"bbox_2d\": [328, 814, 705, 945], \"label\": \"Formula\"}, {\"bbox_2d\": [509, 974, 527, 987], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 95, 911, 466], \"label\": \"Text\"}, {\"bbox_2d\": [121, 479, 911, 849], \"label\": \"Text\"}, {\"bbox_2d\": [121, 862, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 466], \"label\": \"Text\"}, {\"bbox_2d\": [121, 478, 911, 849], \"label\": \"Text\"}, {\"bbox_2d\": [121, 861, 911, 959], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 73, 945, 496], \"label\": \"Text\"}, {\"bbox_2d\": [91, 510, 945, 934], \"label\": \"Text\"}, {\"bbox_2d\": [121, 949, 945, 966], \"label\": \"Text\"}, {\"bbox_2d\": [513, 994, 522, 1006], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 75, 944, 495], \"label\": \"Text\"}, {\"bbox_2d\": [91, 513, 944, 933], \"label\": \"Text\"}, {\"bbox_2d\": [121, 951, 944, 964], \"label\": \"Text\"}, {\"bbox_2d\": [513, 993, 522, 1007], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 121, 219, 142], \"label\": \"Section-header\"}, {\"bbox_2d\": [120, 164, 938, 924], \"label\": \"Table\"}, {\"bbox_2d\": [904, 975, 931, 990], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 124, 219, 143], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 164, 950, 925], \"label\": \"Table\"}, {\"bbox_2d\": [904, 978, 932, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [795, 68, 976, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [918, 41, 974, 63], \"label\": \"Page-header\"}, {\"bbox_2d\": [1004, 207, 1014, 335], \"label\": \"Page-header\"}, {\"bbox_2d\": [79, 207, 272, 221], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 551, 398, 565], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 452, 195, 462], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 492, 119, 502], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 468, 957, 478], \"label\": \"Text\"}, {\"bbox_2d\": [79, 496, 946, 529], \"label\": \"Text\"}, {\"bbox_2d\": [79, 575, 679, 585], \"label\": \"Text\"}, {\"bbox_2d\": [79, 526, 798, 536], \"label\": \"Text\"}, {\"bbox_2d\": [79, 550, 398, 564], \"label\": \"Section-header\"}, {\"bbox_2d\": [79, 237, 958, 429], \"label\": \"Table\"}, {\"bbox_2d\": [79, 595, 958, 892], \"label\": \"Table\"}, {\"bbox_2d\": [942, 994, 990, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [795, 34, 976, 80], \"label\": \"Page-header\"}, {\"bbox_2d\": [1002, 206, 1018, 334], \"label\": \"Page-header\"}, {\"bbox_2d\": [943, 991, 991, 1002], \"label\": \"Page-footer\"}, {\"bbox_2d\": [76, 191, 957, 431], \"label\": \"Table\"}, {\"bbox_2d\": [76, 593, 959, 893], \"label\": \"Table\"}, {\"bbox_2d\": [78, 450, 195, 462], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 547, 399, 566], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 490, 119, 502], \"label\": \"Section-header\"}, {\"bbox_2d\": [78, 467, 956, 479], \"label\": \"Text\"}, {\"bbox_2d\": [78, 507, 950, 530], \"label\": \"Text\"}, {\"bbox_2d\": [78, 573, 680, 584], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [727, 57, 929, 88], \"label\": \"Page-header\"}, {\"bbox_2d\": [485, 987, 513, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [188, 112, 821, 912], \"label\": \"Picture\"}, {\"bbox_2d\": [285, 927, 712, 943], \"label\": \"Caption\"}]```", "label": "```json[{\"bbox_2d\": [184, 113, 825, 917], \"label\": \"Picture\"}, {\"bbox_2d\": [725, 56, 940, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [284, 926, 706, 941], \"label\": \"Caption\"}, {\"bbox_2d\": [486, 981, 518, 995], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [368, 80, 671, 91], \"label\": \"Page-header\"}, {\"bbox_2d\": [114, 145, 506, 449], \"label\": \"Text\"}, {\"bbox_2d\": [114, 485, 506, 516], \"label\": \"Text\"}, {\"bbox_2d\": [114, 554, 506, 757], \"label\": \"Text\"}, {\"bbox_2d\": [114, 768, 506, 859], \"label\": \"Text\"}, {\"bbox_2d\": [114, 865, 506, 955], \"label\": \"Text\"}, {\"bbox_2d\": [528, 145, 922, 294], \"label\": \"Text\"}, {\"bbox_2d\": [528, 300, 922, 353], \"label\": \"Text\"}, {\"bbox_2d\": [528, 359, 922, 390], \"label\": \"Text\"}, {\"bbox_2d\": [114, 465, 271, 479], \"label\": \"Section-header\"}, {\"bbox_2d\": [114, 533, 353, 547], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 406, 922, 790], \"label\": \"Table\"}, {\"bbox_2d\": [528, 799, 922, 862], \"label\": \"Text\"}, {\"bbox_2d\": [528, 866, 922, 898], \"label\": \"Text\"}, {\"bbox_2d\": [513, 999, 522, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [367, 79, 668, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [115, 144, 511, 447], \"label\": \"Text\"}, {\"bbox_2d\": [115, 485, 508, 516], \"label\": \"Text\"}, {\"bbox_2d\": [115, 553, 511, 759], \"label\": \"Text\"}, {\"bbox_2d\": [115, 768, 511, 857], \"label\": \"Text\"}, {\"bbox_2d\": [115, 865, 511, 954], \"label\": \"Text\"}, {\"bbox_2d\": [527, 144, 923, 292], \"label\": \"Text\"}, {\"bbox_2d\": [527, 300, 923, 350], \"label\": \"Text\"}, {\"bbox_2d\": [527, 358, 920, 389], \"label\": \"Text\"}, {\"bbox_2d\": [527, 801, 919, 860], \"label\": \"Text\"}, {\"bbox_2d\": [527, 864, 919, 898], \"label\": \"Text\"}, {\"bbox_2d\": [512, 999, 523, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [526, 403, 920, 794], \"label\": \"Table\"}, {\"bbox_2d\": [115, 465, 270, 477], \"label\": \"Section-header\"}, {\"bbox_2d\": [115, 533, 353, 545], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [199, 66, 454, 82], \"label\": \"Page-header\"}, {\"bbox_2d\": [199, 139, 888, 174], \"label\": \"Text\"}, {\"bbox_2d\": [199, 203, 888, 257], \"label\": \"Text\"}, {\"bbox_2d\": [199, 287, 888, 363], \"label\": \"Text\"}, {\"bbox_2d\": [199, 393, 888, 427], \"label\": \"Text\"}, {\"bbox_2d\": [199, 456, 888, 491], \"label\": \"Text\"}, {\"bbox_2d\": [199, 520, 888, 618], \"label\": \"Text\"}, {\"bbox_2d\": [199, 647, 888, 703], \"label\": \"Text\"}, {\"bbox_2d\": [199, 733, 888, 767], \"label\": \"Text\"}, {\"bbox_2d\": [199, 796, 888, 830], \"label\": \"Text\"}, {\"bbox_2d\": [199, 859, 888, 914], \"label\": \"Text\"}, {\"bbox_2d\": [199, 944, 888, 957], \"label\": \"Text\"}, {\"bbox_2d\": [201, 647, 343, 661], \"label\": \"Section-header\"}, {\"bbox_2d\": [99, 733, 418, 746], \"label\": \"Section-header\"}, {\"bbox_2d\": [99, 796, 268, 809], \"label\": \"Section-header\"}, {\"bbox_2d\": [201, 746, 888, 767], \"label\": \"Text\"}, {\"bbox_2d\": [112, 945, 131, 957], \"label\": \"Page-footer\"}, {\"bbox_2d\": [41, 959, 145, 1018], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [198, 69, 456, 83], \"label\": \"Section-header\"}, {\"bbox_2d\": [199, 140, 887, 173], \"label\": \"Text\"}, {\"bbox_2d\": [199, 204, 887, 258], \"label\": \"Text\"}, {\"bbox_2d\": [199, 288, 887, 364], \"label\": \"Text\"}, {\"bbox_2d\": [199, 394, 887, 427], \"label\": \"Text\"}, {\"bbox_2d\": [199, 458, 887, 491], \"label\": \"Text\"}, {\"bbox_2d\": [199, 521, 887, 618], \"label\": \"Text\"}, {\"bbox_2d\": [101, 648, 887, 702], \"label\": \"List-item\"}, {\"bbox_2d\": [199, 860, 887, 914], \"label\": \"Text\"}, {\"bbox_2d\": [199, 944, 887, 956], \"label\": \"Text\"}, {\"bbox_2d\": [113, 948, 134, 958], \"label\": \"Page-footer\"}, {\"bbox_2d\": [41, 959, 145, 1020], \"label\": \"Picture\"}, {\"bbox_2d\": [101, 796, 887, 829], \"label\": \"List-item\"}, {\"bbox_2d\": [101, 733, 892, 766], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [147, 82, 444, 90], \"label\": \"Caption\"}, {\"bbox_2d\": [580, 82, 903, 90], \"label\": \"Caption\"}, {\"bbox_2d\": [88, 199, 502, 226], \"label\": \"Text\"}, {\"bbox_2d\": [88, 300, 502, 449], \"label\": \"Text\"}, {\"bbox_2d\": [88, 517, 502, 636], \"label\": \"Text\"}, {\"bbox_2d\": [88, 696, 502, 724], \"label\": \"Text\"}, {\"bbox_2d\": [88, 773, 502, 801], \"label\": \"Text\"}, {\"bbox_2d\": [88, 847, 358, 858], \"label\": \"Text\"}, {\"bbox_2d\": [88, 904, 502, 947], \"label\": \"Text\"}, {\"bbox_2d\": [198, 484, 390, 495], \"label\": \"Section-header\"}, {\"bbox_2d\": [555, 208, 869, 221], \"label\": \"Text\"}, {\"bbox_2d\": [575, 234, 951, 250], \"label\": \"Formula\"}, {\"bbox_2d\": [525, 263, 951, 421], \"label\": \"Formula\"}, {\"bbox_2d\": [551, 423, 951, 462], \"label\": \"Formula\"}, {\"bbox_2d\": [577, 475, 951, 586], \"label\": \"Formula\"}, {\"bbox_2d\": [551, 583, 951, 636], \"label\": \"Text\"}, {\"bbox_2d\": [555, 682, 831, 692], \"label\": \"Text\"}, {\"bbox_2d\": [555, 752, 951, 790], \"label\": \"Text\"}, {\"bbox_2d\": [555, 829, 913, 841], \"label\": \"Text\"}, {\"bbox_2d\": [119, 652, 502, 701], \"label\": \"Formula\"}, {\"bbox_2d\": [201, 739, 502, 751], \"label\": \"Formula\"}, {\"bbox_2d\": [180, 814, 502, 826], \"label\": \"Formula\"}, {\"bbox_2d\": [88, 875, 374, 886], \"label\": \"Text\"}, {\"bbox_2d\": [186, 872, 502, 889], \"label\": \"Formula\"}, {\"bbox_2d\": [516, 853, 951, 947], \"label\": \"Formula\"}, {\"bbox_2d\": [555, 780, 951, 806], \"label\": \"Formula\"}, {\"bbox_2d\": [555, 699, 951, 748], \"label\": \"Formula\"}, {\"bbox_2d\": [555, 601, 951, 616], \"label\": \"Text\"}, {\"bbox_2d\": [61, 932, 502, 949], \"label\": \"Page-header\"}, {\"bbox_2d\": [132, 240, 502, 345], \"label\": \"Table\"}, {\"bbox_2d\": [616, 99, 865, 168], \"label\": \"Table\"}, {\"bbox_2d\": [143, 98, 450, 168], \"label\": \"Table\"}, {\"bbox_2d\": [91, 91, 107, 103], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [580, 81, 901, 93], \"label\": \"Caption\"}, {\"bbox_2d\": [145, 81, 445, 93], \"label\": \"Caption\"}, {\"bbox_2d\": [88, 37, 105, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [615, 97, 867, 168], \"label\": \"Picture\"}, {\"bbox_2d\": [144, 96, 446, 168], \"label\": \"Picture\"}, {\"bbox_2d\": [555, 209, 867, 222], \"label\": \"List-item\"}, {\"bbox_2d\": [555, 418, 810, 431], \"label\": \"List-item\"}, {\"bbox_2d\": [555, 679, 831, 692], \"label\": \"List-item\"}, {\"bbox_2d\": [555, 751, 794, 765], \"label\": \"List-item\"}, {\"bbox_2d\": [555, 829, 886, 842], \"label\": \"List-item\"}, {\"bbox_2d\": [654, 776, 948, 811], \"label\": \"Formula\"}, {\"bbox_2d\": [679, 704, 948, 736], \"label\": \"Formula\"}, {\"bbox_2d\": [550, 443, 948, 606], \"label\": \"Formula\"}, {\"bbox_2d\": [550, 288, 931, 405], \"label\": \"Formula\"}, {\"bbox_2d\": [692, 236, 948, 249], \"label\": \"Formula\"}, {\"bbox_2d\": [115, 645, 503, 689], \"label\": \"Formula\"}, {\"bbox_2d\": [179, 813, 503, 831], \"label\": \"Formula\"}, {\"bbox_2d\": [217, 872, 503, 890], \"label\": \"Formula\"}, {\"bbox_2d\": [545, 853, 948, 943], \"label\": \"Formula\"}, {\"bbox_2d\": [88, 197, 503, 225], \"label\": \"Text\"}, {\"bbox_2d\": [575, 264, 608, 277], \"label\": \"Text\"}, {\"bbox_2d\": [88, 298, 503, 450], \"label\": \"Text\"}, {\"bbox_2d\": [88, 693, 502, 723], \"label\": \"Text\"}, {\"bbox_2d\": [88, 516, 503, 634], \"label\": \"Text\"}, {\"bbox_2d\": [88, 771, 503, 799], \"label\": \"Text\"}, {\"bbox_2d\": [88, 845, 357, 858], \"label\": \"Text\"}, {\"bbox_2d\": [88, 904, 502, 947], \"label\": \"Text\"}, {\"bbox_2d\": [575, 609, 948, 667], \"label\": \"Text\"}, {\"bbox_2d\": [199, 734, 503, 777], \"label\": \"Formula\"}, {\"bbox_2d\": [132, 236, 503, 303], \"label\": \"Formula\"}, {\"bbox_2d\": [200, 483, 390, 495], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [164, 139, 645, 337], \"label\": \"Text\"}, {\"bbox_2d\": [164, 382, 511, 446], \"label\": \"Text\"}, {\"bbox_2d\": [164, 452, 517, 514], \"label\": \"Text\"}, {\"bbox_2d\": [164, 518, 833, 580], \"label\": \"Text\"}, {\"bbox_2d\": [513, 991, 522, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [167, 139, 639, 339], \"label\": \"Section-header\"}, {\"bbox_2d\": [167, 379, 835, 580], \"label\": \"Text\"}, {\"bbox_2d\": [515, 990, 523, 1006], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [322, 77, 343, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [457, 57, 605, 69], \"label\": \"Page-header\"}, {\"bbox_2d\": [717, 77, 738, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [545, 93, 909, 309], \"label\": \"Text\"}, {\"bbox_2d\": [147, 93, 516, 228], \"label\": \"Text\"}, {\"bbox_2d\": [147, 229, 516, 674], \"label\": \"Text\"}, {\"bbox_2d\": [545, 309, 909, 367], \"label\": \"Text\"}, {\"bbox_2d\": [545, 369, 909, 503], \"label\": \"Text\"}, {\"bbox_2d\": [545, 505, 909, 765], \"label\": \"Text\"}, {\"bbox_2d\": [545, 768, 909, 866], \"label\": \"Text\"}, {\"bbox_2d\": [147, 677, 516, 765], \"label\": \"Text\"}, {\"bbox_2d\": [147, 770, 516, 841], \"label\": \"Text\"}, {\"bbox_2d\": [147, 846, 516, 967], \"label\": \"Text\"}, {\"bbox_2d\": [545, 875, 909, 967], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [320, 75, 343, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [456, 56, 604, 70], \"label\": \"Page-header\"}, {\"bbox_2d\": [715, 75, 739, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [542, 877, 914, 965], \"label\": \"Text\"}, {\"bbox_2d\": [148, 877, 535, 965], \"label\": \"Text\"}, {\"bbox_2d\": [148, 771, 516, 872], \"label\": \"Text\"}, {\"bbox_2d\": [543, 770, 912, 871], \"label\": \"Text\"}, {\"bbox_2d\": [541, 506, 913, 764], \"label\": \"Text\"}, {\"bbox_2d\": [147, 679, 517, 767], \"label\": \"Text\"}, {\"bbox_2d\": [543, 373, 912, 500], \"label\": \"Text\"}, {\"bbox_2d\": [541, 305, 912, 367], \"label\": \"Text\"}, {\"bbox_2d\": [147, 231, 518, 675], \"label\": \"Text\"}, {\"bbox_2d\": [543, 94, 912, 299], \"label\": \"Text\"}, {\"bbox_2d\": [147, 94, 517, 228], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [801, 61, 973, 89], \"label\": \"Page-header\"}, {\"bbox_2d\": [307, 115, 510, 131], \"label\": \"Caption\"}, {\"bbox_2d\": [139, 590, 533, 981], \"label\": \"Text\"}, {\"bbox_2d\": [589, 589, 980, 664], \"label\": \"Text\"}, {\"bbox_2d\": [589, 665, 984, 981], \"label\": \"Text\"}, {\"bbox_2d\": [491, 467, 800, 515], \"label\": \"Caption\"}, {\"bbox_2d\": [401, 525, 693, 542], \"label\": \"Caption\"}, {\"bbox_2d\": [146, 115, 953, 442], \"label\": \"Picture\"}, {\"bbox_2d\": [808, 1008, 983, 1022], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [792, 59, 988, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [148, 112, 947, 516], \"label\": \"Picture\"}, {\"bbox_2d\": [401, 526, 690, 540], \"label\": \"Caption\"}, {\"bbox_2d\": [587, 586, 985, 661], \"label\": \"Text\"}, {\"bbox_2d\": [589, 661, 994, 979], \"label\": \"Text\"}, {\"bbox_2d\": [138, 588, 546, 979], \"label\": \"Text\"}, {\"bbox_2d\": [544, 989, 567, 1003], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [776, 74, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 94, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [158, 143, 363, 968], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 145, 858, 968], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 93, 524, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [776, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [146, 130, 392, 976], \"label\": \"Picture\"}, {\"bbox_2d\": [526, 130, 865, 980], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 98, 416, 108], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 141, 562, 157], \"label\": \"Text\"}, {\"bbox_2d\": [121, 221, 909, 263], \"label\": \"Text\"}, {\"bbox_2d\": [121, 357, 909, 398], \"label\": \"Text\"}, {\"bbox_2d\": [147, 411, 620, 427], \"label\": \"Text\"}, {\"bbox_2d\": [121, 522, 909, 592], \"label\": \"Text\"}, {\"bbox_2d\": [121, 658, 909, 724], \"label\": \"Text\"}, {\"bbox_2d\": [121, 883, 909, 952], \"label\": \"Text\"}, {\"bbox_2d\": [501, 988, 511, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [147, 746, 909, 867], \"label\": \"Formula\"}, {\"bbox_2d\": [252, 606, 909, 645], \"label\": \"Formula\"}, {\"bbox_2d\": [250, 439, 909, 501], \"label\": \"Formula\"}, {\"bbox_2d\": [338, 311, 692, 335], \"label\": \"Formula\"}, {\"bbox_2d\": [318, 174, 909, 214], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 98, 415, 109], \"label\": \"Section-header\"}, {\"bbox_2d\": [147, 141, 675, 155], \"label\": \"Text\"}, {\"bbox_2d\": [121, 223, 911, 264], \"label\": \"Text\"}, {\"bbox_2d\": [121, 358, 911, 399], \"label\": \"Text\"}, {\"bbox_2d\": [147, 411, 652, 425], \"label\": \"Text\"}, {\"bbox_2d\": [121, 523, 911, 591], \"label\": \"Text\"}, {\"bbox_2d\": [121, 658, 910, 726], \"label\": \"Text\"}, {\"bbox_2d\": [121, 885, 911, 952], \"label\": \"Text\"}, {\"bbox_2d\": [317, 168, 910, 215], \"label\": \"Formula\"}, {\"bbox_2d\": [336, 311, 910, 331], \"label\": \"Formula\"}, {\"bbox_2d\": [361, 442, 910, 497], \"label\": \"Formula\"}, {\"bbox_2d\": [252, 603, 910, 651], \"label\": \"Formula\"}, {\"bbox_2d\": [370, 743, 910, 868], \"label\": \"Formula\"}, {\"bbox_2d\": [502, 989, 511, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [64, 29, 353, 77], \"label\": \"Page-header\"}, {\"bbox_2d\": [62, 89, 477, 164], \"label\": \"Text\"}, {\"bbox_2d\": [62, 165, 476, 276], \"label\": \"Text\"}, {\"bbox_2d\": [62, 277, 476, 390], \"label\": \"Text\"}, {\"bbox_2d\": [515, 238, 947, 390], \"label\": \"Text\"}, {\"bbox_2d\": [515, 89, 933, 239], \"label\": \"Text\"}, {\"bbox_2d\": [483, 968, 513, 986], \"label\": \"Page-footer\"}, {\"bbox_2d\": [158, 451, 832, 883], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [65, 30, 365, 77], \"label\": \"Page-header\"}, {\"bbox_2d\": [517, 88, 939, 237], \"label\": \"Text\"}, {\"bbox_2d\": [516, 238, 955, 387], \"label\": \"Text\"}, {\"bbox_2d\": [63, 88, 484, 162], \"label\": \"Text\"}, {\"bbox_2d\": [63, 163, 489, 274], \"label\": \"Text\"}, {\"bbox_2d\": [63, 275, 484, 387], \"label\": \"Text\"}, {\"bbox_2d\": [155, 450, 838, 883], \"label\": \"Picture\"}, {\"bbox_2d\": [484, 966, 517, 981], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [828, 77, 845, 92], \"label\": \"Page-header\"}, {\"bbox_2d\": [355, 137, 866, 211], \"label\": \"Formula\"}, {\"bbox_2d\": [518, 322, 866, 340], \"label\": \"Formula\"}, {\"bbox_2d\": [131, 320, 456, 380], \"label\": \"Formula\"}, {\"bbox_2d\": [131, 363, 456, 425], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 251, 456, 314], \"label\": \"Text\"}, {\"bbox_2d\": [509, 251, 866, 296], \"label\": \"Text\"}, {\"bbox_2d\": [121, 409, 456, 795], \"label\": \"Text\"}, {\"bbox_2d\": [509, 371, 866, 589], \"label\": \"Text\"}, {\"bbox_2d\": [509, 623, 866, 863], \"label\": \"Text\"}, {\"bbox_2d\": [509, 866, 866, 973], \"label\": \"Text\"}, {\"bbox_2d\": [131, 880, 456, 973], \"label\": \"Text\"}, {\"bbox_2d\": [131, 831, 468, 856], \"label\": \"Section-header\"}, {\"bbox_2d\": [509, 595, 866, 613], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [857, 80, 865, 92], \"label\": \"Page-header\"}, {\"bbox_2d\": [355, 135, 865, 204], \"label\": \"Formula\"}, {\"bbox_2d\": [121, 248, 478, 310], \"label\": \"Text\"}, {\"bbox_2d\": [121, 408, 478, 795], \"label\": \"Text\"}, {\"bbox_2d\": [121, 881, 478, 1002], \"label\": \"Text\"}, {\"bbox_2d\": [508, 866, 865, 1002], \"label\": \"Text\"}, {\"bbox_2d\": [508, 626, 865, 862], \"label\": \"Text\"}, {\"bbox_2d\": [508, 372, 865, 562], \"label\": \"Text\"}, {\"bbox_2d\": [508, 251, 865, 294], \"label\": \"Text\"}, {\"bbox_2d\": [131, 322, 478, 354], \"label\": \"Formula\"}, {\"bbox_2d\": [131, 362, 478, 394], \"label\": \"Formula\"}, {\"bbox_2d\": [517, 315, 865, 339], \"label\": \"Formula\"}, {\"bbox_2d\": [508, 582, 865, 622], \"label\": \"Formula\"}, {\"bbox_2d\": [134, 831, 465, 855], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [152, 108, 733, 122], \"label\": \"Text\"}, {\"bbox_2d\": [152, 190, 791, 203], \"label\": \"Text\"}, {\"bbox_2d\": [133, 270, 901, 325], \"label\": \"Text\"}, {\"bbox_2d\": [133, 392, 901, 566], \"label\": \"Text\"}, {\"bbox_2d\": [133, 653, 738, 666], \"label\": \"Text\"}, {\"bbox_2d\": [133, 755, 736, 768], \"label\": \"Text\"}, {\"bbox_2d\": [133, 755, 736, 768], \"label\": \"Text\"}, {\"bbox_2d\": [152, 835, 724, 849], \"label\": \"Text\"}, {\"bbox_2d\": [152, 916, 730, 929], \"label\": \"Text\"}, {\"bbox_2d\": [152, 835, 724, 849], \"label\": \"Text\"}, {\"bbox_2d\": [152, 835, 724, 849], \"label\": \"Text\"}, {\"bbox_2d\": [152, 835, 724, 849], \"label\": \"Text\"}, {\"bbox_2d\": [152, 835, 724, 849], \"label\": \"Text\"}, {\"bbox_2d\": [152, 835, 724, 849], \"label\": \"Text\"}, {\"bbox_2d\": [190, 695, 886, 729], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 594, 876, 627], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 797, 872, 810], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 876, 856, 890], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 352, 852, 366], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 231, 820, 245], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 151, 883, 164], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 958, 527, 974], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [152, 106, 739, 122], \"label\": \"Text\"}, {\"bbox_2d\": [152, 187, 797, 203], \"label\": \"Text\"}, {\"bbox_2d\": [133, 268, 905, 324], \"label\": \"Text\"}, {\"bbox_2d\": [133, 389, 907, 566], \"label\": \"Text\"}, {\"bbox_2d\": [152, 833, 731, 849], \"label\": \"Text\"}, {\"bbox_2d\": [152, 913, 737, 929], \"label\": \"Text\"}, {\"bbox_2d\": [133, 651, 746, 667], \"label\": \"List-item\"}, {\"bbox_2d\": [133, 752, 743, 768], \"label\": \"List-item\"}, {\"bbox_2d\": [190, 146, 890, 163], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 227, 827, 243], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 348, 860, 364], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 591, 875, 627], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 691, 886, 728], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 792, 879, 808], \"label\": \"Section-header\"}, {\"bbox_2d\": [190, 873, 862, 889], \"label\": \"Section-header\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [776, 64, 983, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [571, 108, 988, 198], \"label\": \"Text\"}, {\"bbox_2d\": [571, 203, 988, 611], \"label\": \"Text\"}, {\"bbox_2d\": [118, 109, 540, 481], \"label\": \"Text\"}, {\"bbox_2d\": [118, 487, 540, 706], \"label\": \"Text\"}, {\"bbox_2d\": [118, 712, 539, 839], \"label\": \"Text\"}, {\"bbox_2d\": [118, 845, 539, 980], \"label\": \"Text\"}, {\"bbox_2d\": [613, 856, 870, 872], \"label\": \"List-item\"}, {\"bbox_2d\": [613, 876, 955, 891], \"label\": \"List-item\"}, {\"bbox_2d\": [613, 895, 988, 926], \"label\": \"List-item\"}, {\"bbox_2d\": [571, 618, 988, 852], \"label\": \"Text\"}, {\"bbox_2d\": [539, 994, 565, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [571, 932, 988, 982], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [777, 59, 1001, 90], \"label\": \"Page-header\"}, {\"bbox_2d\": [572, 103, 991, 196], \"label\": \"Text\"}, {\"bbox_2d\": [119, 104, 549, 479], \"label\": \"Text\"}, {\"bbox_2d\": [573, 197, 997, 609], \"label\": \"Text\"}, {\"bbox_2d\": [117, 479, 543, 703], \"label\": \"Text\"}, {\"bbox_2d\": [573, 610, 997, 848], \"label\": \"Text\"}, {\"bbox_2d\": [120, 704, 543, 835], \"label\": \"Text\"}, {\"bbox_2d\": [120, 838, 544, 975], \"label\": \"Text\"}, {\"bbox_2d\": [573, 923, 994, 977], \"label\": \"Text\"}, {\"bbox_2d\": [613, 850, 879, 867], \"label\": \"List-item\"}, {\"bbox_2d\": [613, 868, 969, 885], \"label\": \"List-item\"}, {\"bbox_2d\": [613, 886, 986, 922], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 986, 573, 1001], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 120, 309, 134], \"label\": \"Page-header\"}, {\"bbox_2d\": [294, 255, 702, 273], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 365, 429, 375], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 502, 455, 512], \"label\": \"Section-header\"}, {\"bbox_2d\": [294, 606, 474, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [62, 1003, 223, 1011], \"label\": \"Page-footer\"}, {\"bbox_2d\": [294, 623, 881, 668], \"label\": \"Text\"}, {\"bbox_2d\": [294, 675, 868, 702], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 709, 868, 737], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 744, 884, 806], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 519, 859, 530], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 537, 828, 565], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 572, 685, 582], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 297, 879, 341], \"label\": \"Text\"}, {\"bbox_2d\": [294, 382, 880, 445], \"label\": \"Text\"}, {\"bbox_2d\": [319, 452, 861, 479], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [12, 362, 22, 431], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 123, 311, 134], \"label\": \"Section-header\"}, {\"bbox_2d\": [296, 255, 702, 274], \"label\": \"Section-header\"}, {\"bbox_2d\": [296, 365, 429, 375], \"label\": \"Section-header\"}, {\"bbox_2d\": [296, 503, 455, 513], \"label\": \"Section-header\"}, {\"bbox_2d\": [296, 607, 474, 616], \"label\": \"Section-header\"}, {\"bbox_2d\": [296, 297, 880, 343], \"label\": \"Text\"}, {\"bbox_2d\": [296, 383, 881, 447], \"label\": \"Text\"}, {\"bbox_2d\": [296, 452, 862, 481], \"label\": \"Text\"}, {\"bbox_2d\": [296, 520, 859, 533], \"label\": \"List-item\"}, {\"bbox_2d\": [296, 538, 829, 567], \"label\": \"List-item\"}, {\"bbox_2d\": [296, 572, 686, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [296, 675, 867, 705], \"label\": \"List-item\"}, {\"bbox_2d\": [296, 710, 870, 739], \"label\": \"List-item\"}, {\"bbox_2d\": [296, 744, 883, 808], \"label\": \"List-item\"}, {\"bbox_2d\": [296, 624, 882, 671], \"label\": \"Text\"}, {\"bbox_2d\": [60, 1002, 223, 1012], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [934, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [106, 68, 495, 327], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 339, 369, 352], \"label\": \"Text\"}, {\"bbox_2d\": [106, 363, 506, 460], \"label\": \"Text\"}, {\"bbox_2d\": [106, 467, 506, 575], \"label\": \"Text\"}, {\"bbox_2d\": [131, 599, 496, 630], \"label\": \"Section-header\"}, {\"bbox_2d\": [106, 644, 506, 784], \"label\": \"Text\"}, {\"bbox_2d\": [106, 786, 506, 947], \"label\": \"Text\"}, {\"bbox_2d\": [536, 646, 951, 754], \"label\": \"Text\"}, {\"bbox_2d\": [536, 756, 951, 842], \"label\": \"Text\"}, {\"bbox_2d\": [536, 903, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [655, 872, 831, 883], \"label\": \"Section-header\"}, {\"bbox_2d\": [536, 512, 951, 648], \"label\": \"Text\"}, {\"bbox_2d\": [536, 375, 811, 469], \"label\": \"Text\"}, {\"bbox_2d\": [536, 263, 951, 365], \"label\": \"Text\"}, {\"bbox_2d\": [536, 72, 951, 233], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [934, 38, 951, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 72, 951, 233], \"label\": \"Text\"}, {\"bbox_2d\": [536, 236, 951, 338], \"label\": \"Text\"}, {\"bbox_2d\": [102, 68, 495, 329], \"label\": \"Formula\"}, {\"bbox_2d\": [686, 350, 802, 379], \"label\": \"Formula\"}, {\"bbox_2d\": [687, 474, 800, 503], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 339, 376, 351], \"label\": \"Text\"}, {\"bbox_2d\": [91, 365, 506, 466], \"label\": \"Text\"}, {\"bbox_2d\": [536, 391, 951, 462], \"label\": \"Text\"}, {\"bbox_2d\": [91, 470, 506, 571], \"label\": \"Text\"}, {\"bbox_2d\": [536, 513, 951, 645], \"label\": \"Text\"}, {\"bbox_2d\": [91, 650, 506, 781], \"label\": \"Text\"}, {\"bbox_2d\": [536, 648, 951, 750], \"label\": \"Text\"}, {\"bbox_2d\": [536, 753, 951, 840], \"label\": \"Text\"}, {\"bbox_2d\": [91, 784, 506, 946], \"label\": \"Text\"}, {\"bbox_2d\": [536, 904, 951, 946], \"label\": \"Text\"}, {\"bbox_2d\": [657, 872, 830, 882], \"label\": \"Section-header\"}, {\"bbox_2d\": [131, 603, 465, 627], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [355, 84, 926, 144], \"label\": \"Text\"}, {\"bbox_2d\": [355, 171, 921, 231], \"label\": \"Text\"}, {\"bbox_2d\": [355, 257, 926, 301], \"label\": \"Text\"}, {\"bbox_2d\": [292, 320, 922, 347], \"label\": \"Text\"}, {\"bbox_2d\": [292, 367, 888, 395], \"label\": \"Text\"}, {\"bbox_2d\": [292, 737, 926, 765], \"label\": \"Text\"}, {\"bbox_2d\": [709, 979, 931, 991], \"label\": \"Page-footer\"}, {\"bbox_2d\": [292, 777, 647, 790], \"label\": \"Caption\"}, {\"bbox_2d\": [292, 241, 430, 253], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 155, 376, 167], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 794, 933, 938], \"label\": \"Table\"}, {\"bbox_2d\": [317, 408, 810, 688], \"label\": \"Text\"}, {\"bbox_2d\": [292, 401, 931, 710], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [354, 85, 925, 145], \"label\": \"Text\"}, {\"bbox_2d\": [354, 171, 918, 231], \"label\": \"Text\"}, {\"bbox_2d\": [354, 257, 920, 302], \"label\": \"Text\"}, {\"bbox_2d\": [293, 320, 917, 349], \"label\": \"Text\"}, {\"bbox_2d\": [293, 367, 884, 396], \"label\": \"Text\"}, {\"bbox_2d\": [293, 737, 922, 766], \"label\": \"Text\"}, {\"bbox_2d\": [293, 778, 648, 789], \"label\": \"Caption\"}, {\"bbox_2d\": [293, 155, 376, 168], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 242, 430, 255], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 790, 932, 940], \"label\": \"Table\"}, {\"bbox_2d\": [709, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [493, 407, 668, 419], \"label\": \"Section-header\"}, {\"bbox_2d\": [316, 429, 810, 571], \"label\": \"Table\"}, {\"bbox_2d\": [317, 677, 540, 689], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [439, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 328, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [543, 111, 741, 121], \"label\": \"Section-header\"}, {\"bbox_2d\": [131, 111, 500, 165], \"label\": \"Text\"}, {\"bbox_2d\": [131, 179, 500, 219], \"label\": \"Text\"}, {\"bbox_2d\": [526, 276, 894, 388], \"label\": \"Text\"}, {\"bbox_2d\": [131, 384, 498, 409], \"label\": \"Text\"}, {\"bbox_2d\": [198, 254, 407, 356], \"label\": \"Picture\"}, {\"bbox_2d\": [658, 399, 760, 407], \"label\": \"Text\"}, {\"bbox_2d\": [619, 154, 791, 247], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [303, 93, 326, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [438, 73, 587, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [698, 93, 723, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [541, 110, 741, 121], \"label\": \"Text\"}, {\"bbox_2d\": [130, 110, 503, 167], \"label\": \"Text\"}, {\"bbox_2d\": [130, 178, 500, 219], \"label\": \"Text\"}, {\"bbox_2d\": [880, 150, 893, 157], \"label\": \"Text\"}, {\"bbox_2d\": [476, 251, 498, 259], \"label\": \"Text\"}, {\"bbox_2d\": [526, 273, 895, 390], \"label\": \"Text\"}, {\"bbox_2d\": [660, 401, 764, 415], \"label\": \"Text\"}, {\"bbox_2d\": [130, 382, 500, 409], \"label\": \"Text\"}, {\"bbox_2d\": [196, 254, 414, 357], \"label\": \"Picture\"}, {\"bbox_2d\": [615, 150, 787, 255], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [122, 49, 296, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [122, 126, 917, 159], \"label\": \"Text\"}, {\"bbox_2d\": [122, 161, 917, 258], \"label\": \"Text\"}, {\"bbox_2d\": [122, 260, 917, 374], \"label\": \"Text\"}, {\"bbox_2d\": [122, 376, 917, 423], \"label\": \"Text\"}, {\"bbox_2d\": [122, 425, 917, 473], \"label\": \"Text\"}, {\"bbox_2d\": [122, 475, 917, 508], \"label\": \"Text\"}, {\"bbox_2d\": [122, 509, 917, 572], \"label\": \"Text\"}, {\"bbox_2d\": [122, 574, 917, 607], \"label\": \"Text\"}, {\"bbox_2d\": [122, 609, 917, 673], \"label\": \"Text\"}, {\"bbox_2d\": [122, 693, 897, 756], \"label\": \"Text\"}, {\"bbox_2d\": [122, 758, 917, 799], \"label\": \"Text\"}, {\"bbox_2d\": [122, 675, 490, 689], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 908, 356, 920], \"label\": \"Section-header\"}, {\"bbox_2d\": [122, 974, 441, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [147, 792, 703, 804], \"label\": \"List-item\"}, {\"bbox_2d\": [147, 808, 734, 820], \"label\": \"List-item\"}, {\"bbox_2d\": [122, 824, 917, 904], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 126, 954, 159], \"label\": \"Text\"}, {\"bbox_2d\": [121, 159, 962, 258], \"label\": \"Text\"}, {\"bbox_2d\": [121, 259, 946, 374], \"label\": \"Text\"}, {\"bbox_2d\": [121, 375, 963, 424], \"label\": \"Text\"}, {\"bbox_2d\": [121, 424, 952, 474], \"label\": \"Text\"}, {\"bbox_2d\": [121, 474, 954, 507], \"label\": \"Text\"}, {\"bbox_2d\": [121, 507, 955, 573], \"label\": \"Text\"}, {\"bbox_2d\": [121, 573, 952, 606], \"label\": \"Text\"}, {\"bbox_2d\": [121, 606, 958, 672], \"label\": \"Text\"}, {\"bbox_2d\": [121, 689, 920, 755], \"label\": \"Text\"}, {\"bbox_2d\": [121, 755, 955, 788], \"label\": \"Text\"}, {\"bbox_2d\": [121, 821, 944, 904], \"label\": \"Text\"}, {\"bbox_2d\": [145, 788, 721, 804], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 805, 756, 821], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 672, 502, 688], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 904, 359, 920], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 75, 468, 94], \"label\": \"Text\"}, {\"bbox_2d\": [87, 129, 935, 196], \"label\": \"Text\"}, {\"bbox_2d\": [87, 201, 955, 250], \"label\": \"Text\"}, {\"bbox_2d\": [87, 258, 945, 310], \"label\": \"Text\"}, {\"bbox_2d\": [87, 646, 945, 700], \"label\": \"Text\"}, {\"bbox_2d\": [87, 705, 962, 754], \"label\": \"Text\"}, {\"bbox_2d\": [87, 762, 965, 826], \"label\": \"Text\"}, {\"bbox_2d\": [87, 859, 937, 925], \"label\": \"Text\"}, {\"bbox_2d\": [87, 931, 981, 980], \"label\": \"Text\"}, {\"bbox_2d\": [485, 1003, 585, 1019], \"label\": \"Page-footer\"}, {\"bbox_2d\": [87, 831, 609, 850], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 503, 238, 521], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 306, 250, 324], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 330, 421, 351], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 353, 972, 391], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 393, 377, 411], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 414, 685, 435], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 437, 895, 458], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 461, 926, 494], \"label\": \"List-item\"}, {\"bbox_2d\": [87, 503, 238, 521], \"label\": \"Section-header\"}, {\"bbox_2d\": [152, 524, 524, 545], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 547, 974, 568], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 571, 565, 592], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 594, 686, 615], \"label\": \"List-item\"}, {\"bbox_2d\": [152, 617, 527, 628], \"label\": \"List-item\"}]```", "label": "```json[{\"bbox_2d\": [485, 14, 947, 41], \"label\": \"Page-header\"}, {\"bbox_2d\": [87, 74, 466, 94], \"label\": \"Text\"}, {\"bbox_2d\": [87, 127, 939, 191], \"label\": \"Text\"}, {\"bbox_2d\": [87, 200, 955, 249], \"label\": \"Text\"}, {\"bbox_2d\": [87, 257, 942, 306], \"label\": \"Text\"}, {\"bbox_2d\": [87, 647, 888, 696], \"label\": \"Text\"}, {\"bbox_2d\": [87, 705, 964, 754], \"label\": \"Text\"}, {\"bbox_2d\": [87, 762, 967, 826], \"label\": \"Text\"}, {\"bbox_2d\": [87, 860, 938, 923], \"label\": \"Text\"}, {\"bbox_2d\": [87, 932, 981, 981], \"label\": \"Text\"}, {\"bbox_2d\": [87, 832, 611, 851], \"label\": \"Section-header\"}, {\"bbox_2d\": [87, 99, 369, 118], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 308, 279, 327], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 329, 422, 348], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 350, 972, 369], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 370, 510, 390], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 398, 279, 417], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 418, 685, 438], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 439, 781, 459], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 460, 925, 495], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 502, 237, 522], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 523, 523, 543], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 544, 972, 563], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 565, 676, 584], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 586, 685, 605], \"label\": \"List-item\"}, {\"bbox_2d\": [129, 607, 529, 626], \"label\": \"List-item\"}, {\"bbox_2d\": [485, 1003, 585, 1019], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [388, 70, 654, 84], \"label\": \"Section-header\"}, {\"bbox_2d\": [292, 98, 605, 111], \"label\": \"Text\"}, {\"bbox_2d\": [266, 122, 609, 136], \"label\": \"Text\"}, {\"bbox_2d\": [165, 137, 736, 150], \"label\": \"Text\"}, {\"bbox_2d\": [316, 151, 736, 176], \"label\": \"Text\"}, {\"bbox_2d\": [183, 186, 736, 251], \"label\": \"Text\"}, {\"bbox_2d\": [183, 267, 473, 277], \"label\": \"Text\"}, {\"bbox_2d\": [183, 308, 506, 438], \"label\": \"Text\"}, {\"bbox_2d\": [183, 443, 506, 515], \"label\": \"Text\"}, {\"bbox_2d\": [183, 520, 506, 668], \"label\": \"Text\"}, {\"bbox_2d\": [183, 673, 506, 725], \"label\": \"Text\"}, {\"bbox_2d\": [536, 554, 951, 601], \"label\": \"Text\"}, {\"bbox_2d\": [536, 678, 951, 721], \"label\": \"Text\"}, {\"bbox_2d\": [91, 776, 506, 820], \"label\": \"Text\"}, {\"bbox_2d\": [536, 856, 951, 881], \"label\": \"Text\"}, {\"bbox_2d\": [536, 399, 628, 413], \"label\": \"Text\"}, {\"bbox_2d\": [536, 460, 888, 474], \"label\": \"Text\"}, {\"bbox_2d\": [536, 348, 951, 386], \"label\": \"Formula\"}, {\"bbox_2d\": [656, 429, 951, 443], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 479, 951, 538], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 619, 951, 669], \"label\": \"Formula\"}, {\"bbox_2d\": [593, 712, 951, 836], \"label\": \"Formula\"}, {\"bbox_2d\": [189, 741, 506, 765], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 836, 506, 853], \"label\": \"Formula\"}, {\"bbox_2d\": [592, 897, 951, 950], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 911, 365, 923], \"label\": \"Footnote\"}, {\"bbox_2d\": [91, 924, 229, 936], \"label\": \"Footnote\"}, {\"bbox_2d\": [91, 937, 359, 949], \"label\": \"Footnote\"}, {\"bbox_2d\": [32, 264, 61, 713], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [388, 70, 654, 84], \"label\": \"Title\"}, {\"bbox_2d\": [326, 101, 715, 116], \"label\": \"Text\"}, {\"bbox_2d\": [267, 122, 775, 133], \"label\": \"Text\"}, {\"bbox_2d\": [252, 136, 790, 146], \"label\": \"Text\"}, {\"bbox_2d\": [166, 149, 876, 174], \"label\": \"Text\"}, {\"bbox_2d\": [184, 185, 858, 251], \"label\": \"Text\"}, {\"bbox_2d\": [184, 267, 473, 277], \"label\": \"Text\"}, {\"bbox_2d\": [91, 306, 506, 439], \"label\": \"Text\"}, {\"bbox_2d\": [91, 442, 506, 515], \"label\": \"Text\"}, {\"bbox_2d\": [91, 518, 506, 666], \"label\": \"Text\"}, {\"bbox_2d\": [91, 669, 506, 727], \"label\": \"Text\"}, {\"bbox_2d\": [91, 776, 506, 819], \"label\": \"Text\"}, {\"bbox_2d\": [536, 853, 951, 881], \"label\": \"Text\"}, {\"bbox_2d\": [536, 678, 951, 721], \"label\": \"Text\"}, {\"bbox_2d\": [536, 555, 951, 598], \"label\": \"Text\"}, {\"bbox_2d\": [553, 459, 887, 484], \"label\": \"Text\"}, {\"bbox_2d\": [536, 400, 741, 413], \"label\": \"Text\"}, {\"bbox_2d\": [536, 306, 951, 343], \"label\": \"Text\"}, {\"bbox_2d\": [91, 912, 364, 922], \"label\": \"Footnote\"}, {\"bbox_2d\": [91, 925, 339, 935], \"label\": \"Footnote\"}, {\"bbox_2d\": [91, 938, 355, 947], \"label\": \"Footnote\"}, {\"bbox_2d\": [91, 837, 521, 865], \"label\": \"Formula\"}, {\"bbox_2d\": [189, 745, 506, 768], \"label\": \"Formula\"}, {\"bbox_2d\": [31, 261, 61, 701], \"label\": \"Page-header\"}, {\"bbox_2d\": [589, 351, 951, 373], \"label\": \"Formula\"}, {\"bbox_2d\": [552, 370, 951, 392], \"label\": \"Formula\"}, {\"bbox_2d\": [661, 427, 951, 452], \"label\": \"Formula\"}, {\"bbox_2d\": [561, 491, 951, 545], \"label\": \"Formula\"}, {\"bbox_2d\": [612, 614, 951, 661], \"label\": \"Formula\"}, {\"bbox_2d\": [568, 736, 951, 837], \"label\": \"Formula\"}, {\"bbox_2d\": [605, 894, 951, 958], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 92, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [790, 73, 893, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [156, 153, 426, 965], \"label\": \"Picture\"}, {\"bbox_2d\": [553, 156, 754, 966], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [130, 74, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 522, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [790, 74, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [158, 152, 428, 966], \"label\": \"Picture\"}, {\"bbox_2d\": [548, 152, 755, 961], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 70, 153, 84], \"label\": \"Text\"}, {\"bbox_2d\": [91, 148, 506, 279], \"label\": \"Text\"}, {\"bbox_2d\": [91, 282, 506, 376], \"label\": \"Text\"}, {\"bbox_2d\": [91, 419, 116, 443], \"label\": \"Text\"}, {\"bbox_2d\": [91, 475, 506, 590], \"label\": \"Text\"}, {\"bbox_2d\": [91, 593, 506, 682], \"label\": \"Text\"}, {\"bbox_2d\": [91, 747, 506, 776], \"label\": \"Text\"}, {\"bbox_2d\": [91, 833, 506, 879], \"label\": \"Text\"}, {\"bbox_2d\": [91, 881, 246, 894], \"label\": \"Text\"}, {\"bbox_2d\": [155, 717, 385, 725], \"label\": \"Section-header\"}, {\"bbox_2d\": [228, 794, 506, 829], \"label\": \"Formula\"}, {\"bbox_2d\": [500, 822, 901, 829], \"label\": \"Formula\"}, {\"bbox_2d\": [500, 839, 901, 862], \"label\": \"Formula\"}, {\"bbox_2d\": [500, 892, 901, 918], \"label\": \"Formula\"}, {\"bbox_2d\": [161, 937, 506, 949], \"label\": \"Formula\"}, {\"bbox_2d\": [204, 388, 506, 416], \"label\": \"Formula\"}, {\"bbox_2d\": [171, 452, 506, 488], \"label\": \"Formula\"}, {\"bbox_2d\": [575, 407, 901, 443], \"label\": \"Formula\"}, {\"bbox_2d\": [575, 501, 901, 537], \"label\": \"Formula\"}, {\"bbox_2d\": [546, 561, 901, 597], \"label\": \"Formula\"}, {\"bbox_2d\": [635, 591, 901, 616], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 635, 901, 736], \"label\": \"Text\"}, {\"bbox_2d\": [536, 739, 901, 786], \"label\": \"Text\"}, {\"bbox_2d\": [536, 841, 901, 862], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 942, 901, 949], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 327, 901, 393], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 449, 564, 465], \"label\": \"Text\"}, {\"bbox_2d\": [536, 412, 576, 428], \"label\": \"Text\"}, {\"bbox_2d\": [536, 560, 564, 576], \"label\": \"Text\"}, {\"bbox_2d\": [536, 399, 576, 416], \"label\": \"Text\"}, {\"bbox_2d\": [536, 597, 564, 616], \"label\": \"Text\"}, {\"bbox_2d\": [536, 619, 576, 635], \"label\": \"Text\"}, {\"bbox_2d\": [536, 638, 901, 669], \"label\": \"Text\"}, {\"bbox_2d\": [536, 796, 564, 812], \"label\": \"Text\"}, {\"bbox_2d\": [536, 874, 564, 890], \"label\": \"Text\"}, {\"bbox_2d\": [536, 717, 901, 736], \"label\": \"Text\"}, {\"bbox_2d\": [536, 739, 901, 786], \"label\": \"Text\"}, {\"bbox_2d\": [536, 822, 901, 862], \"label\": \"Formula\"}, {\"bbox_2d\": [564, 922, 901, 949], \"label\": \"Formula\"}, {\"bbox_2d\": [564, 942, 901, 949], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 892, 901, 918], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 864, 564, 890], \"label\": \"Text\"}, {\"bbox_2d\": [169, 907, 506, 949], \"label\": \"Formula\"}, {\"bbox_2d\": [91, 98, 506, 157], \"label\": \"Formula\"}, {\"bbox_2d\": [536, 68, 952, 307], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [91, 71, 153, 84], \"label\": \"Text\"}, {\"bbox_2d\": [91, 147, 506, 280], \"label\": \"Text\"}, {\"bbox_2d\": [91, 283, 506, 371], \"label\": \"Text\"}, {\"bbox_2d\": [91, 431, 118, 444], \"label\": \"Text\"}, {\"bbox_2d\": [91, 502, 506, 590], \"label\": \"Text\"}, {\"bbox_2d\": [91, 593, 506, 681], \"label\": \"Text\"}, {\"bbox_2d\": [91, 747, 506, 775], \"label\": \"Text\"}, {\"bbox_2d\": [91, 836, 506, 879], \"label\": \"Text\"}, {\"bbox_2d\": [108, 881, 301, 894], \"label\": \"Text\"}, {\"bbox_2d\": [536, 635, 951, 738], \"label\": \"Text\"}, {\"bbox_2d\": [536, 738, 951, 783], \"label\": \"Text\"}, {\"bbox_2d\": [536, 839, 569, 852], \"label\": \"Text\"}, {\"bbox_2d\": [553, 562, 949, 585], \"label\": \"Text\"}, {\"bbox_2d\": [536, 481, 563, 494], \"label\": \"Text\"}, {\"bbox_2d\": [536, 414, 574, 427], \"label\": \"Text\"}, {\"bbox_2d\": [536, 316, 951, 383], \"label\": \"Caption\"}, {\"bbox_2d\": [156, 715, 440, 725], \"label\": \"Section-header\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [92, 98, 506, 145], \"label\": \"Formula\"}, {\"bbox_2d\": [207, 386, 506, 417], \"label\": \"Formula\"}, {\"bbox_2d\": [154, 457, 506, 488], \"label\": \"Formula\"}, {\"bbox_2d\": [575, 437, 951, 476], \"label\": \"Formula\"}, {\"bbox_2d\": [577, 503, 951, 548], \"label\": \"Formula\"}, {\"bbox_2d\": [638, 580, 951, 621], \"label\": \"Formula\"}, {\"bbox_2d\": [230, 789, 506, 823], \"label\": \"Formula\"}, {\"bbox_2d\": [585, 795, 951, 834], \"label\": \"Formula\"}, {\"bbox_2d\": [160, 903, 506, 935], \"label\": \"Formula\"}, {\"bbox_2d\": [565, 862, 951, 892], \"label\": \"Formula\"}, {\"bbox_2d\": [572, 918, 951, 948], \"label\": \"Formula\"}, {\"bbox_2d\": [531, 67, 952, 299], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [143, 102, 899, 904], \"label\": \"Table\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [143, 103, 903, 903], \"label\": \"Table\"}, {\"bbox_2d\": [507, 958, 533, 974], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [264, 103, 914, 152], \"label\": \"Text\"}, {\"bbox_2d\": [231, 175, 917, 204], \"label\": \"List-item\"}, {\"bbox_2d\": [255, 218, 732, 233], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 408, 558, 427], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 428, 878, 461], \"label\": \"Text\"}, {\"bbox_2d\": [231, 474, 567, 489], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 697, 459, 712], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 840, 459, 855], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 546, 917, 612], \"label\": \"Text\"}, {\"bbox_2d\": [231, 652, 926, 684], \"label\": \"Text\"}, {\"bbox_2d\": [231, 635, 400, 651], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 775, 400, 793], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 721, 785, 748], \"label\": \"Text\"}, {\"bbox_2d\": [231, 796, 883, 828], \"label\": \"Text\"}, {\"bbox_2d\": [231, 864, 640, 890], \"label\": \"Text\"}, {\"bbox_2d\": [241, 916, 888, 944], \"label\": \"Text\"}, {\"bbox_2d\": [255, 242, 872, 383], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 721, 784, 748], \"label\": \"Text\"}, {\"bbox_2d\": [231, 775, 400, 793], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 697, 459, 712], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 546, 917, 612], \"label\": \"Text\"}, {\"bbox_2d\": [231, 652, 820, 684], \"label\": \"Text\"}, {\"bbox_2d\": [231, 796, 883, 828], \"label\": \"Text\"}, {\"bbox_2d\": [231, 721, 784, 748], \"label\": \"Text\"}, {\"bbox_2d\": [231, 721, 784, 748], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 546, 926, 684], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 546, 917, 612], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 546, 926, 684], \"label\": \"Text\"}, {\"bbox_2d\": [231, 721, 784, 748], \"label\": \"Text\"}, {\"bbox_2d\": [231, 721, 784, 748], \"label\": \"Text\"}, {\"bbox_2d\": [231, 721, 784, 748], \"label\": \"Text\"}, {\"bbox_2d\": [231, 721, 784, 748], \"label\": \"Text\"}, {\"bbox_2d\": [231, 721, 784, 748], \"label\": \"Text\"}, {\"bbox_2d\": [231, 546, 917, 612], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799, 525], \"label\": \"Text\"}, {\"bbox_2d\": [231, 497, 799,", "label": "```json[{\"bbox_2d\": [266, 107, 916, 151], \"label\": \"Text\"}, {\"bbox_2d\": [231, 432, 881, 460], \"label\": \"Text\"}, {\"bbox_2d\": [241, 551, 916, 610], \"label\": \"Text\"}, {\"bbox_2d\": [231, 656, 926, 683], \"label\": \"Text\"}, {\"bbox_2d\": [231, 798, 882, 826], \"label\": \"Text\"}, {\"bbox_2d\": [241, 917, 890, 945], \"label\": \"Text\"}, {\"bbox_2d\": [758, 988, 876, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [898, 987, 926, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 843, 452, 853], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 865, 634, 890], \"label\": \"Text\"}, {\"bbox_2d\": [231, 722, 779, 747], \"label\": \"Text\"}, {\"bbox_2d\": [231, 499, 794, 524], \"label\": \"Text\"}, {\"bbox_2d\": [255, 244, 873, 381], \"label\": \"Text\"}, {\"bbox_2d\": [255, 221, 731, 232], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 476, 562, 487], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 700, 453, 711], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 778, 395, 793], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 636, 395, 650], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 412, 553, 427], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 177, 918, 205], \"label\": \"List-item\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 96, 911, 357], \"label\": \"Text\"}, {\"bbox_2d\": [121, 369, 911, 494], \"label\": \"Text\"}, {\"bbox_2d\": [121, 582, 911, 953], \"label\": \"Text\"}, {\"bbox_2d\": [496, 989, 516, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [147, 539, 380, 551], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [121, 95, 911, 357], \"label\": \"Text\"}, {\"bbox_2d\": [121, 369, 911, 494], \"label\": \"Text\"}, {\"bbox_2d\": [121, 582, 911, 964], \"label\": \"Text\"}, {\"bbox_2d\": [496, 988, 516, 1004], \"label\": \"Page-footer\"}, {\"bbox_2d\": [147, 538, 380, 551], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [438, 47, 598, 59], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 74, 630, 92], \"label\": \"Text\"}, {\"bbox_2d\": [121, 208, 911, 254], \"label\": \"Text\"}, {\"bbox_2d\": [121, 270, 911, 322], \"label\": \"Text\"}, {\"bbox_2d\": [121, 331, 813, 377], \"label\": \"Text\"}, {\"bbox_2d\": [121, 386, 911, 466], \"label\": \"Text\"}, {\"bbox_2d\": [121, 476, 911, 557], \"label\": \"Text\"}, {\"bbox_2d\": [121, 686, 305, 701], \"label\": \"Text\"}, {\"bbox_2d\": [141, 713, 911, 962], \"label\": \"Text\"}, {\"bbox_2d\": [121, 767, 911, 821], \"label\": \"Text\"}, {\"bbox_2d\": [141, 834, 911, 962], \"label\": \"Text\"}, {\"bbox_2d\": [141, 894, 911, 962], \"label\": \"Text\"}, {\"bbox_2d\": [277, 113, 644, 177], \"label\": \"Formula\"}, {\"bbox_2d\": [175, 575, 663, 644], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [121, 48, 137, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [436, 48, 599, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 76, 630, 92], \"label\": \"Text\"}, {\"bbox_2d\": [275, 111, 642, 160], \"label\": \"Formula\"}, {\"bbox_2d\": [367, 162, 760, 177], \"label\": \"Text\"}, {\"bbox_2d\": [121, 206, 914, 266], \"label\": \"Text\"}, {\"bbox_2d\": [121, 260, 914, 327], \"label\": \"Text\"}, {\"bbox_2d\": [121, 327, 914, 387], \"label\": \"Text\"}, {\"bbox_2d\": [121, 388, 914, 466], \"label\": \"Text\"}, {\"bbox_2d\": [121, 477, 914, 557], \"label\": \"Text\"}, {\"bbox_2d\": [256, 578, 779, 653], \"label\": \"Formula\"}, {\"bbox_2d\": [142, 686, 307, 713], \"label\": \"Text\"}, {\"bbox_2d\": [121, 713, 914, 965], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [293, 83, 508, 97], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 640, 523, 654], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 147, 481, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [355, 163, 897, 191], \"label\": \"Text\"}, {\"bbox_2d\": [355, 109, 903, 136], \"label\": \"Text\"}, {\"bbox_2d\": [415, 202, 596, 214], \"label\": \"Text\"}, {\"bbox_2d\": [415, 295, 747, 308], \"label\": \"Text\"}, {\"bbox_2d\": [415, 375, 596, 379], \"label\": \"Text\"}, {\"bbox_2d\": [415, 442, 902, 453], \"label\": \"Text\"}, {\"bbox_2d\": [415, 501, 771, 512], \"label\": \"Text\"}, {\"bbox_2d\": [415, 524, 596, 536], \"label\": \"Text\"}, {\"bbox_2d\": [415, 603, 903, 629], \"label\": \"Text\"}, {\"bbox_2d\": [355, 665, 876, 692], \"label\": \"Text\"}, {\"bbox_2d\": [355, 704, 536, 716], \"label\": \"Text\"}, {\"bbox_2d\": [415, 744, 921, 771], \"label\": \"Text\"}, {\"bbox_2d\": [355, 783, 548, 794], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 799, 926, 856], \"label\": \"Text\"}, {\"bbox_2d\": [355, 868, 494, 879], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 884, 926, 927], \"label\": \"Text\"}, {\"bbox_2d\": [355, 869, 381, 880], \"label\": \"Text\"}, {\"bbox_2d\": [355, 640, 523, 654], \"label\": \"Section-header\"}, {\"bbox_2d\": [355, 727, 414, 738], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 547, 474, 558], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 390, 474, 401], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 225, 474, 237], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 280, 469, 291], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 319, 798, 331], \"label\": \"Section-header\"}, {\"bbox_2d\": [355, 343, 769, 355], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 447, 497, 457], \"label\": \"Text\"}, {\"bbox_2d\": [415, 406, 926, 434], \"label\": \"Text\"}, {\"bbox_2d\": [415, 459, 902, 472], \"label\": \"Text\"}, {\"bbox_2d\": [415, 563, 929, 590], \"label\": \"Text\"}, {\"bbox_2d\": [415, 603, 502, 613], \"label\": \"Text\"}, {\"bbox_2d\": [415, 742, 920, 771], \"label\": \"Text\"}, {\"bbox_2d\": [415, 801, 926, 858], \"label\": \"Text\"}, {\"bbox_2d\": [594, 979, 932, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [293, 85, 507, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 148, 481, 161], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 226, 473, 239], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 281, 469, 294], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 390, 473, 403], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 445, 497, 458], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 484, 409, 497], \"label\": \"Section-header\"}, {\"bbox_2d\": [415, 547, 473, 560], \"label\": \"Section-header\"}, {\"bbox_2d\": [293, 641, 523, 654], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 727, 412, 740], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 782, 547, 795], \"label\": \"Section-header\"}, {\"bbox_2d\": [354, 868, 494, 881], \"label\": \"Section-header\"}, {\"bbox_2d\": [594, 978, 932, 993], \"label\": \"Page-footer\"}, {\"bbox_2d\": [415, 884, 926, 928], \"label\": \"Text\"}, {\"bbox_2d\": [415, 798, 924, 858], \"label\": \"Text\"}, {\"bbox_2d\": [415, 743, 920, 772], \"label\": \"Text\"}, {\"bbox_2d\": [354, 704, 536, 717], \"label\": \"Text\"}, {\"bbox_2d\": [354, 664, 876, 693], \"label\": \"Text\"}, {\"bbox_2d\": [476, 617, 904, 631], \"label\": \"Text\"}, {\"bbox_2d\": [415, 602, 500, 615], \"label\": \"Section-header\"}, {\"bbox_2d\": [476, 563, 929, 591], \"label\": \"Text\"}, {\"bbox_2d\": [415, 523, 597, 537], \"label\": \"Text\"}, {\"bbox_2d\": [415, 500, 771, 513], \"label\": \"Text\"}, {\"bbox_2d\": [476, 461, 901, 474], \"label\": \"Text\"}, {\"bbox_2d\": [476, 406, 926, 435], \"label\": \"Text\"}, {\"bbox_2d\": [415, 367, 597, 380], \"label\": \"Text\"}, {\"bbox_2d\": [476, 296, 748, 310], \"label\": \"Text\"}, {\"bbox_2d\": [476, 242, 904, 270], \"label\": \"Text\"}, {\"bbox_2d\": [415, 202, 597, 216], \"label\": \"Text\"}, {\"bbox_2d\": [415, 163, 896, 192], \"label\": \"Text\"}, {\"bbox_2d\": [415, 320, 798, 333], \"label\": \"List-item\"}, {\"bbox_2d\": [354, 343, 768, 357], \"label\": \"List-item\"}, {\"bbox_2d\": [354, 108, 903, 137], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 93, 926, 121], \"label\": \"Text\"}, {\"bbox_2d\": [231, 371, 646, 383], \"label\": \"Text\"}, {\"bbox_2d\": [231, 632, 926, 674], \"label\": \"Text\"}, {\"bbox_2d\": [231, 692, 916, 735], \"label\": \"Text\"}, {\"bbox_2d\": [231, 343, 838, 355], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 599, 831, 613], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 914, 664, 925], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 140, 928, 341], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 402, 928, 599], \"label\": \"Picture\"}, {\"bbox_2d\": [230, 753, 728, 909], \"label\": \"Picture\"}, {\"bbox_2d\": [507, 987, 926, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 926, 121], \"label\": \"Text\"}, {\"bbox_2d\": [231, 371, 646, 383], \"label\": \"Text\"}, {\"bbox_2d\": [231, 631, 926, 674], \"label\": \"Text\"}, {\"bbox_2d\": [231, 691, 916, 735], \"label\": \"Text\"}, {\"bbox_2d\": [229, 751, 728, 909], \"label\": \"Picture\"}, {\"bbox_2d\": [230, 403, 927, 597], \"label\": \"Picture\"}, {\"bbox_2d\": [229, 139, 927, 336], \"label\": \"Picture\"}, {\"bbox_2d\": [231, 342, 837, 353], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 602, 831, 613], \"label\": \"Caption\"}, {\"bbox_2d\": [231, 914, 664, 925], \"label\": \"Caption\"}, {\"bbox_2d\": [505, 987, 926, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [274, 83, 406, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [274, 194, 641, 208], \"label\": \"Text\"}, {\"bbox_2d\": [274, 536, 536, 550], \"label\": \"Text\"}, {\"bbox_2d\": [274, 708, 855, 722], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 557, 873, 585], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 591, 910, 638], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 644, 855, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 712, 868, 753], \"label\": \"Text\"}, {\"bbox_2d\": [274, 759, 860, 774], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 780, 448, 794], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 811, 646, 825], \"label\": \"Text\"}, {\"bbox_2d\": [274, 103, 765, 117], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 123, 749, 137], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 143, 765, 157], \"label\": \"List-item\"}, {\"bbox_2d\": [274, 163, 448, 176], \"label\": \"List-item\"}, {\"bbox_2d\": [294, 220, 777, 488], \"label\": \"Picture\"}, {\"bbox_2d\": [274, 689, 406, 702], \"label\": \"Section-header\"}, {\"bbox_2d\": [299, 741, 868, 753], \"label\": \"Text\"}, {\"bbox_2d\": [101, 979, 370, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [273, 85, 406, 98], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 195, 640, 208], \"label\": \"Text\"}, {\"bbox_2d\": [273, 104, 765, 118], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 122, 748, 137], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 141, 765, 157], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 163, 449, 176], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 555, 873, 586], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 590, 909, 637], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 641, 854, 672], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 710, 854, 723], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 759, 860, 774], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 781, 449, 794], \"label\": \"List-item\"}, {\"bbox_2d\": [273, 812, 646, 825], \"label\": \"Text\"}, {\"bbox_2d\": [300, 741, 868, 755], \"label\": \"Text\"}, {\"bbox_2d\": [273, 691, 406, 704], \"label\": \"Section-header\"}, {\"bbox_2d\": [273, 538, 535, 551], \"label\": \"Text\"}, {\"bbox_2d\": [297, 219, 776, 489], \"label\": \"Text\"}, {\"bbox_2d\": [101, 978, 370, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [150, 93, 910, 282], \"label\": \"Text\"}, {\"bbox_2d\": [150, 296, 910, 512], \"label\": \"Text\"}, {\"bbox_2d\": [150, 526, 910, 691], \"label\": \"Text\"}, {\"bbox_2d\": [150, 705, 910, 946], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [150, 92, 910, 283], \"label\": \"Text\"}, {\"bbox_2d\": [150, 296, 911, 513], \"label\": \"Text\"}, {\"bbox_2d\": [150, 526, 910, 691], \"label\": \"Text\"}, {\"bbox_2d\": [150, 704, 911, 945], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [121, 48, 298, 79], \"label\": \"Page-header\"}, {\"bbox_2d\": [121, 126, 918, 209], \"label\": \"Text\"}, {\"bbox_2d\": [121, 209, 917, 276], \"label\": \"Text\"}, {\"bbox_2d\": [121, 276, 909, 310], \"label\": \"Text\"}, {\"bbox_2d\": [121, 309, 883, 359], \"label\": \"Text\"}, {\"bbox_2d\": [253, 359, 782, 408], \"label\": \"Section-header\"}, {\"bbox_2d\": [121, 406, 297, 423], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 423, 886, 453], \"label\": \"Text\"}, {\"bbox_2d\": [121, 453, 878, 483], \"label\": \"Text\"}, {\"bbox_2d\": [121, 483, 642, 501], \"label\": \"Text\"}, {\"bbox_2d\": [121, 501, 775, 514], \"label\": \"Text\"}, {\"bbox_2d\": [121, 514, 917, 605], \"label\": \"Text\"}, {\"bbox_2d\": [121, 605, 908, 635], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 635, 917, 679], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 679, 917, 709], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 709, 918, 785], \"label\": \"Text\"}, {\"bbox_2d\": [121, 785, 905, 831], \"label\": \"Text\"}, {\"bbox_2d\": [121, 831, 627, 847], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 847, 544, 876], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 875, 917, 905], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 905, 859, 935], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 973, 444, 988], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [121, 47, 206, 63], \"label\": \"Text\"}, {\"bbox_2d\": [121, 63, 300, 79], \"label\": \"Text\"}, {\"bbox_2d\": [121, 126, 956, 209], \"label\": \"Text\"}, {\"bbox_2d\": [121, 209, 956, 275], \"label\": \"Text\"}, {\"bbox_2d\": [121, 275, 943, 308], \"label\": \"Text\"}, {\"bbox_2d\": [121, 308, 921, 358], \"label\": \"Text\"}, {\"bbox_2d\": [121, 408, 296, 422], \"label\": \"Text\"}, {\"bbox_2d\": [121, 423, 900, 452], \"label\": \"Text\"}, {\"bbox_2d\": [121, 453, 900, 482], \"label\": \"Text\"}, {\"bbox_2d\": [145, 483, 650, 497], \"label\": \"Text\"}, {\"bbox_2d\": [145, 498, 792, 512], \"label\": \"Text\"}, {\"bbox_2d\": [121, 513, 957, 603], \"label\": \"Text\"}, {\"bbox_2d\": [121, 603, 913, 633], \"label\": \"Text\"}, {\"bbox_2d\": [121, 633, 950, 678], \"label\": \"Text\"}, {\"bbox_2d\": [121, 678, 949, 708], \"label\": \"Text\"}, {\"bbox_2d\": [121, 709, 954, 783], \"label\": \"Text\"}, {\"bbox_2d\": [121, 784, 938, 829], \"label\": \"Text\"}, {\"bbox_2d\": [121, 859, 312, 874], \"label\": \"Text\"}, {\"bbox_2d\": [121, 874, 949, 904], \"label\": \"Text\"}, {\"bbox_2d\": [121, 904, 885, 934], \"label\": \"Text\"}, {\"bbox_2d\": [145, 829, 625, 843], \"label\": \"List-item\"}, {\"bbox_2d\": [145, 844, 550, 858], \"label\": \"List-item\"}, {\"bbox_2d\": [121, 972, 454, 988], \"label\": \"Page-footer\"}, {\"bbox_2d\": [252, 358, 788, 407], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [98, 70, 394, 123], \"label\": \"Text\"}, {\"bbox_2d\": [98, 971, 236, 982], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [98, 71, 393, 122], \"label\": \"Text\"}, {\"bbox_2d\": [98, 971, 234, 980], \"label\": \"Page-footer\"}, {\"bbox_2d\": [0, 370, 1035, 581], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 373, 506, 431], \"label\": \"Text\"}, {\"bbox_2d\": [91, 451, 506, 811], \"label\": \"Text\"}, {\"bbox_2d\": [91, 812, 506, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 658, 951, 767], \"label\": \"Text\"}, {\"bbox_2d\": [536, 771, 951, 948], \"label\": \"Text\"}, {\"bbox_2d\": [536, 563, 951, 644], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 274, 951, 327], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 274, 506, 327], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 67, 912, 257], \"label\": \"Picture\"}, {\"bbox_2d\": [125, 66, 470, 266], \"label\": \"Picture\"}, {\"bbox_2d\": [536, 355, 912, 519], \"label\": \"Picture\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}]```", "label": "```json[{\"bbox_2d\": [536, 273, 951, 326], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 275, 506, 328], \"label\": \"Caption\"}, {\"bbox_2d\": [536, 563, 951, 616], \"label\": \"Caption\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [125, 64, 471, 258], \"label\": \"Picture\"}, {\"bbox_2d\": [571, 67, 915, 258], \"label\": \"Picture\"}, {\"bbox_2d\": [568, 358, 919, 549], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 374, 506, 432], \"label\": \"Text\"}, {\"bbox_2d\": [536, 662, 951, 765], \"label\": \"Text\"}, {\"bbox_2d\": [91, 452, 506, 809], \"label\": \"Text\"}, {\"bbox_2d\": [536, 769, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [91, 814, 506, 947], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [746, 49, 924, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [112, 102, 913, 952], \"label\": \"Picture\"}, {\"bbox_2d\": [303, 958, 735, 979], \"label\": \"Caption\"}, {\"bbox_2d\": [510, 988, 529, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [113, 97, 909, 950], \"label\": \"Picture\"}, {\"bbox_2d\": [739, 50, 934, 81], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 966, 742, 982], \"label\": \"Caption\"}, {\"bbox_2d\": [511, 987, 535, 1002], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [526, 39, 546, 48], \"label\": \"Page-header\"}, {\"bbox_2d\": [117, 78, 367, 93], \"label\": \"Text\"}, {\"bbox_2d\": [117, 98, 957, 152], \"label\": \"Text\"}, {\"bbox_2d\": [117, 157, 957, 191], \"label\": \"Text\"}, {\"bbox_2d\": [117, 196, 957, 273], \"label\": \"Text\"}, {\"bbox_2d\": [117, 279, 616, 290], \"label\": \"Text\"}, {\"bbox_2d\": [117, 295, 957, 368], \"label\": \"Text\"}, {\"bbox_2d\": [117, 373, 957, 468], \"label\": \"Text\"}, {\"bbox_2d\": [117, 473, 957, 505], \"label\": \"Text\"}, {\"bbox_2d\": [117, 510, 957, 544], \"label\": \"Text\"}, {\"bbox_2d\": [117, 549, 957, 604], \"label\": \"Text\"}, {\"bbox_2d\": [117, 609, 957, 704], \"label\": \"Text\"}, {\"bbox_2d\": [117, 709, 957, 878], \"label\": \"Text\"}, {\"bbox_2d\": [117, 883, 957, 939], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [528, 40, 544, 49], \"label\": \"Page-header\"}, {\"bbox_2d\": [116, 76, 362, 93], \"label\": \"Text\"}, {\"bbox_2d\": [116, 96, 959, 152], \"label\": \"Text\"}, {\"bbox_2d\": [116, 156, 959, 192], \"label\": \"Text\"}, {\"bbox_2d\": [116, 196, 959, 271], \"label\": \"Text\"}, {\"bbox_2d\": [162, 274, 610, 290], \"label\": \"Text\"}, {\"bbox_2d\": [116, 293, 959, 368], \"label\": \"Text\"}, {\"bbox_2d\": [117, 372, 959, 467], \"label\": \"Text\"}, {\"bbox_2d\": [117, 470, 959, 506], \"label\": \"Text\"}, {\"bbox_2d\": [117, 509, 959, 545], \"label\": \"Text\"}, {\"bbox_2d\": [117, 549, 959, 604], \"label\": \"Text\"}, {\"bbox_2d\": [117, 608, 959, 703], \"label\": \"Text\"}, {\"bbox_2d\": [117, 706, 960, 880], \"label\": \"Text\"}, {\"bbox_2d\": [118, 882, 961, 940], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [25, 23, 155, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 68, 875, 81], \"label\": \"Section-header\"}, {\"bbox_2d\": [304, 89, 729, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 111, 306, 133], \"label\": \"Text\"}, {\"bbox_2d\": [25, 145, 352, 156], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 162, 1009, 229], \"label\": \"Text\"}, {\"bbox_2d\": [25, 239, 1009, 283], \"label\": \"Text\"}, {\"bbox_2d\": [25, 311, 1009, 357], \"label\": \"Text\"}, {\"bbox_2d\": [25, 367, 1009, 445], \"label\": \"Text\"}, {\"bbox_2d\": [25, 473, 1009, 528], \"label\": \"Text\"}, {\"bbox_2d\": [25, 296, 136, 307], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 457, 164, 468], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 553, 525, 564], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [25, 25, 155, 36], \"label\": \"Page-header\"}, {\"bbox_2d\": [25, 71, 875, 82], \"label\": \"Section-header\"}, {\"bbox_2d\": [304, 91, 730, 102], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 146, 352, 157], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 296, 135, 307], \"label\": \"Section-header\"}, {\"bbox_2d\": [25, 458, 162, 469], \"label\": \"Section-header\"}, {\"bbox_2d\": [508, 553, 525, 564], \"label\": \"Page-footer\"}, {\"bbox_2d\": [25, 474, 1009, 530], \"label\": \"Text\"}, {\"bbox_2d\": [25, 366, 1009, 445], \"label\": \"Text\"}, {\"bbox_2d\": [25, 312, 1009, 357], \"label\": \"Text\"}, {\"bbox_2d\": [25, 238, 1009, 283], \"label\": \"Text\"}, {\"bbox_2d\": [25, 162, 1009, 229], \"label\": \"Text\"}, {\"bbox_2d\": [25, 111, 306, 133], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [123, 66, 473, 262], \"label\": \"Picture\"}, {\"bbox_2d\": [91, 269, 506, 309], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 350, 506, 643], \"label\": \"Text\"}, {\"bbox_2d\": [91, 648, 506, 766], \"label\": \"Text\"}, {\"bbox_2d\": [91, 843, 506, 875], \"label\": \"Text\"}, {\"bbox_2d\": [536, 71, 951, 673], \"label\": \"Text\"}, {\"bbox_2d\": [536, 677, 951, 812], \"label\": \"Text\"}, {\"bbox_2d\": [536, 815, 951, 945], \"label\": \"Text\"}, {\"bbox_2d\": [115, 799, 481, 825], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 891, 506, 956], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [126, 67, 471, 254], \"label\": \"Picture\"}, {\"bbox_2d\": [943, 37, 951, 50], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 269, 506, 308], \"label\": \"Caption\"}, {\"bbox_2d\": [91, 349, 506, 647], \"label\": \"Text\"}, {\"bbox_2d\": [536, 68, 952, 673], \"label\": \"Text\"}, {\"bbox_2d\": [91, 649, 506, 767], \"label\": \"Text\"}, {\"bbox_2d\": [536, 677, 951, 810], \"label\": \"Text\"}, {\"bbox_2d\": [536, 814, 951, 947], \"label\": \"Text\"}, {\"bbox_2d\": [91, 844, 506, 872], \"label\": \"Text\"}, {\"bbox_2d\": [115, 798, 482, 824], \"label\": \"Section-header\"}, {\"bbox_2d\": [95, 885, 506, 947], \"label\": \"Formula\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [99, 979, 360, 991], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [101, 978, 360, 993], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 27, 389, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 647, 164], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 173, 425, 185], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 287, 613, 299], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 190, 510, 273], \"label\": \"Text\"}, {\"bbox_2d\": [91, 298, 510, 424], \"label\": \"Text\"}, {\"bbox_2d\": [528, 94, 947, 139], \"label\": \"Text\"}, {\"bbox_2d\": [528, 146, 947, 273], \"label\": \"Text\"}, {\"bbox_2d\": [91, 283, 394, 295], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 301, 860, 313], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 300, 510, 419], \"label\": \"Text\"}, {\"bbox_2d\": [528, 317, 947, 371], \"label\": \"Text\"}, {\"bbox_2d\": [528, 382, 947, 437], \"label\": \"Text\"}, {\"bbox_2d\": [528, 439, 947, 465], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 467, 947, 494], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 505, 947, 544], \"label\": \"Text\"}, {\"bbox_2d\": [91, 497, 386, 509], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 514, 510, 613], \"label\": \"Text\"}, {\"bbox_2d\": [91, 638, 510, 664], \"label\": \"Text\"}, {\"bbox_2d\": [528, 557, 947, 626], \"label\": \"Text\"}, {\"bbox_2d\": [528, 633, 947, 675], \"label\": \"Text\"}, {\"bbox_2d\": [91, 666, 289, 678], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 680, 193, 692], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 687, 883, 699], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 706, 947, 743], \"label\": \"Text\"}, {\"bbox_2d\": [528, 746, 947, 772], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 775, 947, 829], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 700, 510, 757], \"label\": \"Text\"}, {\"bbox_2d\": [528, 838, 947, 880], \"label\": \"Text\"}, {\"bbox_2d\": [528, 890, 858, 902], \"label\": \"Text\"}, {\"bbox_2d\": [91, 766, 510, 852], \"label\": \"Text\"}, {\"bbox_2d\": [91, 861, 510, 888], \"label\": \"Text\"}, {\"bbox_2d\": [91, 890, 206, 902], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 905, 255, 917], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 920, 254, 932], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 999, 452, 1009], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 27, 389, 35], \"label\": \"Page-header\"}, {\"bbox_2d\": [91, 95, 507, 163], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 667, 289, 677], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 681, 194, 691], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 745, 944, 770], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 774, 944, 827], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 892, 205, 902], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 906, 255, 916], \"label\": \"List-item\"}, {\"bbox_2d\": [91, 920, 253, 930], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 440, 944, 465], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 469, 944, 494], \"label\": \"List-item\"}, {\"bbox_2d\": [528, 289, 613, 299], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 1000, 451, 1008], \"label\": \"Page-footer\"}, {\"bbox_2d\": [528, 890, 859, 900], \"label\": \"Text\"}, {\"bbox_2d\": [528, 839, 944, 878], \"label\": \"Text\"}, {\"bbox_2d\": [91, 863, 507, 887], \"label\": \"Text\"}, {\"bbox_2d\": [91, 769, 507, 851], \"label\": \"Text\"}, {\"bbox_2d\": [91, 703, 507, 756], \"label\": \"Text\"}, {\"bbox_2d\": [528, 702, 944, 741], \"label\": \"Text\"}, {\"bbox_2d\": [528, 95, 944, 134], \"label\": \"Text\"}, {\"bbox_2d\": [528, 146, 944, 271], \"label\": \"Text\"}, {\"bbox_2d\": [91, 189, 507, 271], \"label\": \"Text\"}, {\"bbox_2d\": [91, 298, 507, 423], \"label\": \"Text\"}, {\"bbox_2d\": [91, 435, 507, 474], \"label\": \"Text\"}, {\"bbox_2d\": [91, 500, 507, 611], \"label\": \"Text\"}, {\"bbox_2d\": [91, 638, 507, 662], \"label\": \"Text\"}, {\"bbox_2d\": [528, 637, 944, 675], \"label\": \"Text\"}, {\"bbox_2d\": [528, 557, 944, 624], \"label\": \"Text\"}, {\"bbox_2d\": [528, 506, 944, 545], \"label\": \"Text\"}, {\"bbox_2d\": [528, 383, 944, 436], \"label\": \"Text\"}, {\"bbox_2d\": [528, 317, 944, 371], \"label\": \"Text\"}, {\"bbox_2d\": [528, 303, 860, 313], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 283, 395, 293], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 175, 423, 185], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 486, 386, 496], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 623, 320, 633], \"label\": \"Section-header\"}, {\"bbox_2d\": [528, 688, 884, 698], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [98, 46, 496, 145], \"label\": \"Text\"}, {\"bbox_2d\": [577, 46, 976, 269], \"label\": \"Text\"}, {\"bbox_2d\": [98, 151, 496, 248], \"label\": \"Text\"}, {\"bbox_2d\": [98, 256, 496, 290], \"label\": \"Text\"}, {\"bbox_2d\": [577, 792, 780, 946], \"label\": \"Text\"}, {\"bbox_2d\": [98, 792, 288, 897], \"label\": \"Text\"}, {\"bbox_2d\": [320, 792, 503, 897], \"label\": \"Text\"}, {\"bbox_2d\": [959, 992, 973, 1002], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [98, 48, 495, 142], \"label\": \"Text\"}, {\"bbox_2d\": [98, 153, 495, 247], \"label\": \"Text\"}, {\"bbox_2d\": [98, 257, 505, 289], \"label\": \"Text\"}, {\"bbox_2d\": [576, 48, 973, 268], \"label\": \"Text\"}, {\"bbox_2d\": [319, 792, 503, 896], \"label\": \"Text\"}, {\"bbox_2d\": [98, 792, 286, 897], \"label\": \"List-item\"}, {\"bbox_2d\": [578, 792, 779, 944], \"label\": \"List-item\"}, {\"bbox_2d\": [958, 994, 973, 1003], \"label\": \"Page-footer\"}, {\"bbox_2d\": [0, 304, 976, 777], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [699, 93, 723, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [440, 73, 585, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 93, 328, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [536, 113, 895, 159], \"label\": \"Text\"}, {\"bbox_2d\": [536, 163, 895, 208], \"label\": \"Text\"}, {\"bbox_2d\": [536, 211, 895, 233], \"label\": \"Text\"}, {\"bbox_2d\": [536, 237, 895, 273], \"label\": \"Text\"}, {\"bbox_2d\": [536, 276, 895, 308], \"label\": \"Text\"}, {\"bbox_2d\": [536, 311, 895, 361], \"label\": \"Text\"}, {\"bbox_2d\": [536, 364, 895, 414], \"label\": \"Text\"}, {\"bbox_2d\": [536, 417, 895, 452], \"label\": \"Text\"}, {\"bbox_2d\": [536, 455, 895, 505], \"label\": \"Text\"}, {\"bbox_2d\": [536, 508, 895, 543], \"label\": \"Text\"}, {\"bbox_2d\": [536, 546, 895, 582], \"label\": \"Text\"}, {\"bbox_2d\": [536, 585, 895, 618], \"label\": \"Text\"}, {\"bbox_2d\": [536, 622, 895, 644], \"label\": \"Text\"}, {\"bbox_2d\": [536, 647, 895, 697], \"label\": \"Text\"}, {\"bbox_2d\": [536, 700, 895, 750], \"label\": \"Text\"}, {\"bbox_2d\": [536, 753, 895, 810], \"label\": \"Text\"}, {\"bbox_2d\": [536, 813, 895, 846], \"label\": \"Text\"}, {\"bbox_2d\": [536, 849, 895, 871], \"label\": \"Text\"}, {\"bbox_2d\": [536, 874, 895, 907], \"label\": \"Text\"}, {\"bbox_2d\": [125, 112, 501, 212], \"label\": \"Text\"}, {\"bbox_2d\": [125, 215, 500, 265], \"label\": \"Text\"}, {\"bbox_2d\": [125, 272, 500, 342], \"label\": \"Text\"}, {\"bbox_2d\": [125, 345, 501, 404], \"label\": \"Text\"}, {\"bbox_2d\": [125, 407, 501, 512], \"label\": \"Text\"}, {\"bbox_2d\": [125, 515, 501, 591], \"label\": \"Text\"}, {\"bbox_2d\": [125, 594, 501, 670], \"label\": \"Text\"}, {\"bbox_2d\": [125, 673, 501, 732], \"label\": \"Text\"}, {\"bbox_2d\": [125, 735, 501, 761], \"label\": \"Text\"}, {\"bbox_2d\": [125, 764, 501, 840], \"label\": \"Text\"}, {\"bbox_2d\": [125, 843, 501, 919], \"label\": \"Text\"}, {\"bbox_2d\": [125, 922, 501, 945], \"label\": \"Text\"}, {\"bbox_2d\": [660, 936, 762, 945], \"label\": \"Text\"}, {\"bbox_2d\": [618, 907, 895, 928], \"label\": \"Text\"}, {\"bbox_2d\": [618, 881, 895, 907], \"label\": \"Text\"}, {\"bbox_2d\": [618, 854, 895, 874], \"label\": \"Text\"}, {\"bbox_2d\": [618, 813, 895, 846], \"label\": \"Text\"}, {\"bbox_2d\": [618, 756, 895, 789], \"label\": \"Text\"}, {\"bbox_2d\": [618, 700, 895, 746], \"label\": \"Text\"}, {\"bbox_2d\": [618, 659, 895, 697], \"label\": \"Text\"}, {\"bbox_2d\": [618, 617, 895, 653], \"label\": \"Text\"}, {\"bbox_2d\": [618, 575, 895, 608], \"label\": \"Text\"}, {\"bbox_2d\": [618, 546, 895, 575], \"label\": \"Text\"}, {\"bbox_2d\": [618, 488, 895, 543], \"label\": \"Text\"}, {\"bbox_2d\": [618, 459, 895, 488], \"label\": \"Text\"}, {\"bbox_2d\": [618, 415, 895, 452], \"label\": \"Text\"}, {\"bbox_2d\": [618, 353, 895, 414], \"label\": \"Text\"}, {\"bbox_2d\": [618, 325, 895, 353], \"label\": \"Text\"}, {\"bbox_2d\": [618, 286, 895, 325], \"label\": \"Text\"}, {\"bbox_2d\": [618, 223, 895, 273], \"label\": \"Text\"}, {\"bbox_2d\": [618, 211, 895, 223], \"label\": \"Text\"}, {\"bbox_2d\": [125, 376, 348, 386], \"label\": \"Text\"}, {\"bbox_2d\": [125, 390, 300, 404], \"label\": \"Text\"}, {\"bbox_2d\": [125, 407, 300, 417], \"label\": \"Text\"}, {\"bbox_2d\": [125, 421, 348, 431], \"label\": \"Text\"}, {\"bbox_2d\": [125, 435, 406, 445], \"label\": \"Text\"}, {\"bbox_2d\": [125, 449, 406, 459], \"label\": \"Text\"}, {\"bbox_2d\": [125, 463, 300, 473], \"label\": \"Text\"}, {\"bbox_2d\": [125, 477, 300, 487], \"label\": \"Text\"}, {\"bbox_2d\": [125, 491, 348, 501], \"label\": \"Text\"}, {\"bbox_2d\": [125, 505, 348, 515], \"label\": \"Text\"}, {\"bbox_2d\": [125, 519, 348, 529], \"", "label": "```json[{\"bbox_2d\": [440, 73, 586, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [698, 93, 723, 106], \"label\": \"Page-header\"}, {\"bbox_2d\": [303, 92, 326, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [131, 241, 500, 276], \"label\": \"Text\"}, {\"bbox_2d\": [130, 278, 500, 313], \"label\": \"Text\"}, {\"bbox_2d\": [131, 316, 500, 375], \"label\": \"Text\"}, {\"bbox_2d\": [146, 377, 501, 685], \"label\": \"Text\"}, {\"bbox_2d\": [131, 687, 501, 735], \"label\": \"Text\"}, {\"bbox_2d\": [131, 737, 500, 760], \"label\": \"Text\"}, {\"bbox_2d\": [130, 762, 500, 846], \"label\": \"Text\"}, {\"bbox_2d\": [130, 849, 500, 895], \"label\": \"Text\"}, {\"bbox_2d\": [130, 898, 500, 945], \"label\": \"Text\"}, {\"bbox_2d\": [660, 937, 764, 951], \"label\": \"Text\"}, {\"bbox_2d\": [526, 904, 894, 926], \"label\": \"Text\"}, {\"bbox_2d\": [526, 867, 895, 902], \"label\": \"Text\"}, {\"bbox_2d\": [526, 830, 895, 865], \"label\": \"Text\"}, {\"bbox_2d\": [526, 805, 894, 828], \"label\": \"Text\"}, {\"bbox_2d\": [526, 755, 898, 803], \"label\": \"Text\"}, {\"bbox_2d\": [526, 668, 896, 753], \"label\": \"Text\"}, {\"bbox_2d\": [526, 631, 894, 667], \"label\": \"Text\"}, {\"bbox_2d\": [524, 582, 894, 630], \"label\": \"Text\"}, {\"bbox_2d\": [524, 544, 895, 579], \"label\": \"Text\"}, {\"bbox_2d\": [526, 470, 895, 542], \"label\": \"Text\"}, {\"bbox_2d\": [526, 433, 894, 468], \"label\": \"Text\"}, {\"bbox_2d\": [526, 383, 895, 432], \"label\": \"Text\"}, {\"bbox_2d\": [526, 259, 895, 282], \"label\": \"Text\"}, {\"bbox_2d\": [526, 210, 895, 258], \"label\": \"Text\"}, {\"bbox_2d\": [526, 111, 891, 133], \"label\": \"Text\"}, {\"bbox_2d\": [541, 135, 895, 207], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 284, 898, 319], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 321, 898, 343], \"label\": \"List-item\"}, {\"bbox_2d\": [541, 346, 896, 381], \"label\": \"List-item\"}, {\"bbox_2d\": [131, 111, 500, 239], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [547, 43, 926, 56], \"label\": \"Page-header\"}, {\"bbox_2d\": [109, 43, 211, 55], \"label\": \"Page-header\"}, {\"bbox_2d\": [231, 93, 925, 953], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 692, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [109, 45, 211, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [546, 46, 926, 57], \"label\": \"Page-header\"}, {\"bbox_2d\": [231, 93, 923, 953], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 693, 999], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [937, 758, 966, 958], \"label\": \"Page-header\"}, {\"bbox_2d\": [34, 508, 44, 523], \"label\": \"Page-header\"}, {\"bbox_2d\": [50, 92, 912, 958], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [52, 89, 915, 963], \"label\": \"Picture\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [91, 71, 500, 90], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 323, 321, 342], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 667, 326, 686], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 115, 945, 287], \"label\": \"Text\"}, {\"bbox_2d\": [91, 365, 945, 632], \"label\": \"Text\"}, {\"bbox_2d\": [91, 711, 945, 882], \"label\": \"Text\"}, {\"bbox_2d\": [508, 994, 527, 1005], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [91, 75, 499, 89], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 326, 319, 340], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 671, 324, 684], \"label\": \"Section-header\"}, {\"bbox_2d\": [91, 712, 944, 882], \"label\": \"Text\"}, {\"bbox_2d\": [91, 367, 944, 631], \"label\": \"Text\"}, {\"bbox_2d\": [91, 116, 944, 287], \"label\": \"Text\"}, {\"bbox_2d\": [508, 993, 527, 1007], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [125, 90, 587, 103], \"label\": \"Text\"}, {\"bbox_2d\": [154, 114, 555, 127], \"label\": \"Text\"}, {\"bbox_2d\": [125, 221, 911, 285], \"label\": \"Text\"}, {\"bbox_2d\": [125, 377, 911, 418], \"label\": \"Text\"}, {\"bbox_2d\": [125, 518, 911, 630], \"label\": \"Text\"}, {\"bbox_2d\": [125, 641, 911, 851], \"label\": \"Text\"}, {\"bbox_2d\": [125, 862, 911, 898], \"label\": \"Text\"}, {\"bbox_2d\": [125, 909, 911, 947], \"label\": \"Text\"}, {\"bbox_2d\": [509, 974, 526, 987], \"label\": \"Page-footer\"}, {\"bbox_2d\": [358, 311, 911, 360], \"label\": \"Formula\"}, {\"bbox_2d\": [420, 150, 911, 202], \"label\": \"Formula\"}, {\"bbox_2d\": [238, 434, 911, 511], \"label\": \"Formula\"}]```", "label": "```json[{\"bbox_2d\": [125, 91, 586, 104], \"label\": \"Text\"}, {\"bbox_2d\": [151, 116, 555, 128], \"label\": \"Text\"}, {\"bbox_2d\": [125, 222, 911, 283], \"label\": \"Text\"}, {\"bbox_2d\": [246, 428, 911, 522], \"label\": \"Formula\"}, {\"bbox_2d\": [420, 141, 911, 216], \"label\": \"Formula\"}, {\"bbox_2d\": [362, 313, 911, 362], \"label\": \"Formula\"}, {\"bbox_2d\": [125, 862, 911, 898], \"label\": \"Text\"}, {\"bbox_2d\": [125, 910, 910, 947], \"label\": \"Text\"}, {\"bbox_2d\": [125, 641, 911, 849], \"label\": \"Text\"}, {\"bbox_2d\": [125, 518, 911, 629], \"label\": \"Text\"}, {\"bbox_2d\": [125, 379, 911, 418], \"label\": \"Text\"}, {\"bbox_2d\": [509, 974, 527, 987], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [109, 113, 520, 278], \"label\": \"Text\"}, {\"bbox_2d\": [559, 113, 972, 166], \"label\": \"Text\"}, {\"bbox_2d\": [559, 170, 972, 613], \"label\": \"Text\"}, {\"bbox_2d\": [559, 617, 977, 983], \"label\": \"Text\"}, {\"bbox_2d\": [109, 731, 524, 983], \"label\": \"Text\"}, {\"bbox_2d\": [527, 994, 553, 1006], \"label\": \"Page-footer\"}, {\"bbox_2d\": [109, 282, 520, 727], \"label\": \"Section-header\"}, {\"bbox_2d\": [767, 68, 969, 98], \"label\": \"Section-header\"}]```", "label": "```json[{\"bbox_2d\": [766, 67, 989, 98], \"label\": \"Page-header\"}, {\"bbox_2d\": [560, 110, 979, 165], \"label\": \"Text\"}, {\"bbox_2d\": [109, 110, 530, 276], \"label\": \"Text\"}, {\"bbox_2d\": [560, 166, 986, 612], \"label\": \"Text\"}, {\"bbox_2d\": [109, 278, 536, 724], \"label\": \"Text\"}, {\"bbox_2d\": [562, 613, 989, 978], \"label\": \"Text\"}, {\"bbox_2d\": [109, 725, 531, 978], \"label\": \"Text\"}, {\"bbox_2d\": [529, 989, 561, 1004], \"label\": \"Page-footer\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [72, 47, 404, 169], \"label\": \"Text\"}, {\"bbox_2d\": [72, 175, 404, 287], \"label\": \"Text\"}, {\"bbox_2d\": [72, 297, 404, 399], \"label\": \"Text\"}, {\"bbox_2d\": [72, 405, 404, 539], \"label\": \"Text\"}, {\"bbox_2d\": [72, 546, 404, 595], \"label\": \"Text\"}, {\"bbox_2d\": [91, 617, 1009, 809], \"label\": \"Text\"}, {\"bbox_2d\": [845, 553, 974, 599], \"label\": \"Caption\"}, {\"bbox_2d\": [1007, 1007, 1020, 1021], \"label\": \"Page-footer\"}, {\"bbox_2d\": [438, 0, 1035, 545], \"label\": \"Picture\"}]```", "label": "```json[{\"bbox_2d\": [73, 47, 330, 60], \"label\": \"Section-header\"}, {\"bbox_2d\": [73, 68, 401, 167], \"label\": \"Text\"}, {\"bbox_2d\": [73, 175, 402, 291], \"label\": \"Text\"}, {\"bbox_2d\": [73, 299, 401, 397], \"label\": \"Text\"}, {\"bbox_2d\": [73, 405, 401, 538], \"label\": \"Text\"}, {\"bbox_2d\": [73, 546, 401, 594], \"label\": \"Text\"}, {\"bbox_2d\": [438, 0, 1035, 545], \"label\": \"Picture\"}, {\"bbox_2d\": [845, 552, 973, 601], \"label\": \"Caption\"}, {\"bbox_2d\": [1006, 1013, 1021, 1023], \"label\": \"Page-footer\"}, {\"bbox_2d\": [92, 617, 1007, 810], \"label\": \"Text\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [790, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [502, 92, 524, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [459, 155, 529, 165], \"label\": \"Text\"}, {\"bbox_2d\": [157, 164, 848, 468], \"label\": \"Picture\"}, {\"bbox_2d\": [130, 495, 501, 519], \"label\": \"Text\"}, {\"bbox_2d\": [130, 532, 501, 677], \"label\": \"Text\"}, {\"bbox_2d\": [130, 681, 500, 736], \"label\": \"Text\"}, {\"bbox_2d\": [130, 749, 439, 763], \"label\": \"Text\"}, {\"bbox_2d\": [130, 776, 501, 885], \"label\": \"Text\"}, {\"bbox_2d\": [130, 889, 501, 954], \"label\": \"Text\"}, {\"bbox_2d\": [130, 958, 501, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 495, 894, 531], \"label\": \"Text\"}, {\"bbox_2d\": [526, 535, 895, 713], \"label\": \"Text\"}, {\"bbox_2d\": [526, 726, 650, 736], \"label\": \"Text\"}, {\"bbox_2d\": [526, 749, 895, 772], \"label\": \"Text\"}, {\"bbox_2d\": [526, 785, 895, 876], \"label\": \"Text\"}, {\"bbox_2d\": [526, 889, 895, 982], \"label\": \"Text\"}]```", "label": "```json[{\"bbox_2d\": [130, 73, 313, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [501, 93, 523, 107], \"label\": \"Page-header\"}, {\"bbox_2d\": [790, 73, 892, 87], \"label\": \"Page-header\"}, {\"bbox_2d\": [159, 154, 845, 462], \"label\": \"Picture\"}, {\"bbox_2d\": [131, 494, 499, 518], \"label\": \"Text\"}, {\"bbox_2d\": [130, 531, 501, 678], \"label\": \"Text\"}, {\"bbox_2d\": [130, 684, 499, 735], \"label\": \"Text\"}, {\"bbox_2d\": [131, 776, 500, 882], \"label\": \"Text\"}, {\"bbox_2d\": [131, 888, 500, 953], \"label\": \"Text\"}, {\"bbox_2d\": [131, 959, 495, 982], \"label\": \"Text\"}, {\"bbox_2d\": [526, 494, 895, 530], \"label\": \"Text\"}, {\"bbox_2d\": [526, 533, 895, 713], \"label\": \"Text\"}, {\"bbox_2d\": [526, 747, 895, 770], \"label\": \"Text\"}, {\"bbox_2d\": [526, 774, 895, 810], \"label\": \"Text\"}, {\"bbox_2d\": [527, 814, 894, 824], \"label\": \"Text\"}, {\"bbox_2d\": [526, 828, 896, 877], \"label\": \"Text\"}, {\"bbox_2d\": [526, 881, 898, 982], \"label\": \"Text\"}, {\"bbox_2d\": [131, 753, 441, 763], \"label\": \"Section-header\"}, {\"bbox_2d\": [526, 728, 651, 738], \"label\": \"Section-header\"}]```\n"} {"prompt": "system\nYou are a helpful assistant.\nuser\nPlease carefully observe the document and detect the following regions: \"Caption\", \"Footnote\", \"Formula\", \"List-item\", \"Page-footer\", \"Page-header\", \"Picture\", \"Section-header\", \"Table\", \"Text\", \"Title\". Output each detected region's bbox coordinates in JSON format. The format of the bbox coordinates is: \n```json[{\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}, {\"bbox_2d\": [x1, y1, x2, y2], \"label\": \"region name\"}]```.\nOutput the final answer in tags.\nassistant\n", "predict": "```json[{\"bbox_2d\": [231, 92, 924, 137], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 147, 909, 174], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 211, 569, 231], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 309, 715, 325], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 777, 368, 792], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 255, 903, 281], \"label\": \"Text\"}, {\"bbox_2d\": [231, 343, 914, 387], \"label\": \"Text\"}, {\"bbox_2d\": [231, 404, 926, 447], \"label\": \"Text\"}, {\"bbox_2d\": [231, 464, 911, 490], \"label\": \"Text\"}, {\"bbox_2d\": [231, 508, 909, 598], \"label\": \"Text\"}, {\"bbox_2d\": [231, 616, 926, 643], \"label\": \"Text\"}, {\"bbox_2d\": [231, 659, 926, 750], \"label\": \"Text\"}, {\"bbox_2d\": [231, 811, 926, 870], \"label\": \"Text\"}, {\"bbox_2d\": [231, 887, 926, 930], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 688, 999], \"label\": \"Page-footer\"}]```", "label": "```json[{\"bbox_2d\": [231, 93, 926, 136], \"label\": \"List-item\"}, {\"bbox_2d\": [231, 147, 910, 174], \"label\": \"List-item\"}, {\"bbox_2d\": [109, 212, 564, 231], \"label\": \"Section-header\"}, {\"bbox_2d\": [109, 309, 709, 324], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 254, 902, 282], \"label\": \"Text\"}, {\"bbox_2d\": [231, 343, 915, 386], \"label\": \"Text\"}, {\"bbox_2d\": [231, 403, 926, 446], \"label\": \"Text\"}, {\"bbox_2d\": [231, 463, 910, 491], \"label\": \"Text\"}, {\"bbox_2d\": [231, 508, 909, 598], \"label\": \"Text\"}, {\"bbox_2d\": [231, 615, 926, 643], \"label\": \"Text\"}, {\"bbox_2d\": [231, 659, 926, 750], \"label\": \"Text\"}, {\"bbox_2d\": [109, 777, 367, 793], \"label\": \"Section-header\"}, {\"bbox_2d\": [231, 811, 926, 870], \"label\": \"Text\"}, {\"bbox_2d\": [109, 987, 685, 999], \"label\": \"Page-footer\"}, {\"bbox_2d\": [231, 887, 926, 931], \"label\": \"Text\"}]```\n"}