make REAM models for exactly one-two programming languages

#6
by zotona0 - opened

i see, what for callibration was used https://huggingface.co/datasets/bigcode/the-stack-smol
we can filter dataset by lang field.
is it possible with preserve near original perfomance?

we used a random subset without any filtering, and not planning to try filtering at this moment, but hopefully will release the code soon, so it could be done by others

bknyaz changed discussion status to closed

Sign up or log in to comment