Clelia Astra Bertelli
as-cle-bert
AI & ML interests
Biology + Artificial Intelligence = ❤️ | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source
Recent Activity
liked a dataset 13 days ago
llamaindex/liteparse_bench_small liked a model 8 months ago
facebook/dinov2-small posted an update about 1 year ago
Let's pipe some 𝗱𝗮𝘁𝗮 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝘄𝗲𝗯 into our vector database, shall we?🤠
With 𝐢𝐧𝐠𝐞𝐬𝐭-𝐚𝐧𝐲𝐭𝐡𝐢𝐧𝐠 𝐯𝟏.𝟑.𝟎 (https://github.com/AstraBert/ingest-anything) you can now scrape content simply starting from URLs, extract the text from it, chunk it and put it into your favorite LlamaIndex-compatible database!🕸️
You can do it thanks to 𝗰𝗿𝗮𝘄𝗹𝗲𝗲 by Apify, an open-source crawling library for python and javascript that handles all the data flow from the web: ingest-anything then combines it with 𝗕𝗲𝗮𝘂𝘁𝗶𝗳𝘂𝗹𝗦𝗼𝘂𝗽, 𝗣𝗱𝗳𝗜𝘁𝗗𝗼𝘄𝗻 and 𝗣𝘆𝗠𝘂𝗣𝗱𝗳 to scrape HTML files, convert them to PDF and extract the text - hassle-free!😸
Check the attached code snippet if you're curious of knowing how to get started🎬
PS: Don't tell anybody, but this release also has another gem... It supports OpenAI models for agentic chunking, following the new releases of Chonkie🦛✨
If you don't want to miss out on the new features, leave us a little star on GitHub ➡️ https://github.com/AstraBert/ingest-anything
And join our discord community! ➡️ https://discord.gg/kDqHNjks