safebooru 2023.06 rip + addons from yande-re, danbooru, e621, zerochan

Category:
Date:
2023-10-10 04:44 UTC
Submitter:
Seeders:
4
Information:
No information.
Leechers:
0
File size:
220.7 GiB
Completed:
52
Info hash:
53740d076de34269c871c79fa49af7f701eb3c75
Here is routinely produced **volume V2023B for interval 03.2023-06.2023** in series of composite safebooru-based rips [12.2022 - 03.2023](https://nyaa.si/view/1720018) volume V2023A [08.2022 - 11.2022](https://nyaa.si/view/1634287) volume V2022D [05.2022 - 08.2022](https://nyaa.si/view/1574093) volume V2022C and their predecessors aimed to feed **BOORU CHARS datasets [2021](https://nyaa.si/view/1384820) , [2015](https://nyaa.si/view/1468367) , [2022](https://nyaa.si/view/1547662)** and upcoming 2023 Following description is (recursively and) borely similar to previous volumes ones because of stable datapump. **This rips are not intended to be "complete and maximum quality" but rather "representative the best of" to help users not to loose interesting fandom or artist and get all stuff with several clicks. Another reason to build this megalythe is neural network training over art images. There are [promising results](https://github.com/aperveyev/booru_yolo), stay tuned.** Sources used (_priorities high to low when deduplicating_): * safebooru.org (ID 43xxxxx) **letter S** in archive/folder name * yande.re (with some questionable images in separate Q-folders) **letter Y** * danbooru.donmai.us (NSFW subset in Q) **D** * e621.net (also with Q) **E** FURRY ! * zerochan.net **Z** **155.055** images sorted and zipped according aspect ratio (dimensions 2 folders) _priorities high to low_ : - **45449** "artbook pages" **7x10 (+/- 4%)** - **25433** “wide pages” **3x4 (+/- 10%)** - **29392** “squares” **1x1 (+/- 20%)** - **28216** “wallpapers and computer screens” **3x2 (+/- 40%)** - **26485** "high pages" **2x3 (+/- 40%)** and also for _**source**_ and (sometimes) _**ID range**_, mentioned in _**folder/archive name**_. You can browse pictures directly in archives with FastStone MaxView of something like it. File names structure : **%website% - %id% - %up_to_3_copyrights% ~ %up_to_5_characters% (%up_to_2_artists%).%ext%** where - %copyright% , %character% and %artist% may be used as filter for search on source booru - %website% + %id% is unique and also may be used to get direct booru url so you can extract subsets of interest with xcopy (from already unzipped images) or unzipping (from release on the fly) e.g. ``` for %%F in ("d:\Safebooru 2023b\*.zip") do 7z x -r -o"e:\sortarea\" "%%F" *spy*family* xcopy /s d:\Safebooru 2023b\*spy*family* e:\sortarea ``` **Transformations and filters:** - initially filtered Mpixels >= 1.2, width >= 900, height >= 900 - PNG converted to JPG (quality 94%), no animations - downsize to 60MPix and/or maxsize 9000 px, 18MByte+ images carefully mogrified - tall/wide stripes dropped or adjusted to aspect ratio 0.4 .. 2.1 - manually (yep, plenty of ~~hand~~job behind this release) - comic and 4koma, segmented scans and overtexted covers filtered out - real-life photos, no-character landscapes, most of line-arts and primitive chibi thrown away - too explicit images (uncensored nipples or vulva, obvious adult actions etc) excluded from "questionable" downloads - crops done when large simple or dirty background, most artbooks de-bordered - occationally gamma correction, denoise and other nontrivial improvements made - carefully deduplicatied (with AntiDupl NET up to 4% similarity) along with several past releases Some meta-information included in tab delimited files with evident header line: - **V2023B_files.TSV** post info (size, resolution, MD5 etc) with concatenated copyrights / characters / artists tags (Excel capable) - **V2023B_tags.TSV** all tags (incl. general and meta) one tag per line (4.239.637 rows not fit into Excel) Using some database you can play with SQL and xcopy (from already unzipped images, copypasting query result) anything you want, e.g. ``` select 'xcopy "d:\'||torr_path||'\'||file_name||'" e:\sortarea ' xc from files f join tags t on t.booru=f.booru and t.fid=f.fid where t.tag like '%never_seen_a_guy_recreate_this_successfully%' -- memetic ``` NOTE1: volume 2023C (till 08.2023 w/o zerochan) is on the way , **no more rips planned** NOTE2: **final** sampled dataset BOORU CHARS 2023 will consists of 2022C+D , 2023A+B+C , some old stuff and **a lot of consolidated metadata for all project**

File list

  • Safebooru 2023b
    • 1x1.d.615.zip (2.8 GiB)
    • 1x1.d.625.zip (4.1 GiB)
    • 1x1.d.q.zip (891.9 MiB)
    • 1x1.e.q.zip (2.5 GiB)
    • 1x1.e.zip (2.7 GiB)
    • 1x1.sb.430.zip (2.4 GiB)
    • 1x1.sb.432.zip (2.4 GiB)
    • 1x1.sb.434.zip (2.4 GiB)
    • 1x1.sb.436.zip (2.6 GiB)
    • 1x1.sb.438.zip (2.4 GiB)
    • 1x1.y.q.zip (501.8 MiB)
    • 1x1.y.zip (312.4 MiB)
    • 1x1.z.zip (3.0 GiB)
    • 2x3.d.615.zip (6.7 GiB)
    • 2x3.d.625.zip (5.7 GiB)
    • 2x3.d.633.zip (5.1 GiB)
    • 2x3.d.q.zip (1.8 GiB)
    • 2x3.e.q.zip (897.5 MiB)
    • 2x3.e.zip (932.8 MiB)
    • 2x3.sb.430.zip (2.6 GiB)
    • 2x3.sb.432.zip (2.6 GiB)
    • 2x3.sb.434.zip (2.5 GiB)
    • 2x3.sb.436.zip (2.7 GiB)
    • 2x3.sb.438.zip (2.3 GiB)
    • 2x3.y.q.zip (6.7 GiB)
    • 2x3.y.zip (2.0 GiB)
    • 2x3.z.zip (4.1 GiB)
    • 3x2.d.615.zip (3.9 GiB)
    • 3x2.d.625.zip (6.2 GiB)
    • 3x2.d.q.zip (1.2 GiB)
    • 3x2.e.q.zip (1.9 GiB)
    • 3x2.e.zip (2.5 GiB)
    • 3x2.sb.430.zip (3.5 GiB)
    • 3x2.sb.432.zip (3.5 GiB)
    • 3x2.sb.434.zip (3.4 GiB)
    • 3x2.sb.436.zip (3.6 GiB)
    • 3x2.sb.438.zip (3.5 GiB)
    • 3x2.y.q.zip (2.4 GiB)
    • 3x2.y.zip (1.5 GiB)
    • 3x2.z.zip (4.2 GiB)
    • 3x4.d.615.zip (5.1 GiB)
    • 3x4.d.625.zip (5.0 GiB)
    • 3x4.d.q.zip (1.1 GiB)
    • 3x4.e.q.zip (1.6 GiB)
    • 3x4.e.zip (1.5 GiB)
    • 3x4.sb.430.zip (2.2 GiB)
    • 3x4.sb.432.zip (2.3 GiB)
    • 3x4.sb.434.zip (2.2 GiB)
    • 3x4.sb.436.zip (2.5 GiB)
    • 3x4.sb.438.zip (2.4 GiB)
    • 3x4.y.q.zip (1.2 GiB)
    • 3x4.y.zip (518.8 MiB)
    • 3x4.z.zip (2.9 GiB)
    • 7x10.d.615.zip (5.0 GiB)
    • 7x10.d.620.zip (2.8 GiB)
    • 7x10.d.625.zip (5.2 GiB)
    • 7x10.d.630.zip (5.1 GiB)
    • 7x10.d.635.zip (5.7 GiB)
    • 7x10.d.q.zip (3.3 GiB)
    • 7x10.e.q.zip (1.5 GiB)
    • 7x10.e.zip (1.9 GiB)
    • 7x10.sb.430.zip (5.0 GiB)
    • 7x10.sb.432.zip (5.1 GiB)
    • 7x10.sb.434.zip (5.1 GiB)
    • 7x10.sb.436.zip (5.1 GiB)
    • 7x10.sb.438.zip (5.2 GiB)
    • 7x10.y.q.zip (6.1 GiB)
    • 7x10.y.zip (2.4 GiB)
    • 7x10.z.391.zip (5.2 GiB)
    • 7x10.z.394.zip (3.4 GiB)
    • V2023B_files.tsv (41.4 MiB)
    • V2023B_tags.tsv (216.2 MiB)