kentjzhu's picture

kentjzhu

kentjzhu

·

AI & ML interests

None yet

Recent Activity

reacted to abdurrahmanbutler's post with 🤗 3 days ago

🚀 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗞𝗮𝗻𝗼𝗻 𝟮 𝗘𝗻𝗿𝗶𝗰𝗵𝗲𝗿: 𝘁𝗵𝗲 𝘄𝗼𝗿𝗹𝗱’𝘀 𝗳𝗶𝗿𝘀𝘁 𝗵𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗴𝗿𝗮𝗽𝗵𝗶𝘁𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹 Today we’re publicly releasing Kanon 2 Enricher, and with it, an entirely new class of AI model that we’re calling a hierarchical graphitization model. This is fundamentally different from both universal extraction models and generative models. As a hierarchical graphitization model, Kanon 2 Enricher natively outputs a 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗴𝗿𝗮𝗽𝗵 rather than tokens, which makes it architecturally incapable of hallucinating or inventing text that wasn’t present in the input. What that enables in practice is unlike any other model or ML architecture on the market: • 𝗡𝗼 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗼𝗻𝘀 🤖 It cannot hallucinate. All references and links are stored as spans, meaning exact character offsets anchored to the original text. • 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝘀𝗲𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻, 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 📑 It deconstructs a document’s full nested hierarchy, down to chapters, sections, clauses, schedules, signatures, and even singular sentences, and classifies each span with dozens of contextual features. • 𝗘𝗻𝘁𝗶𝘁𝘆 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻, 𝗱𝗶𝘀𝗮𝗺𝗯𝗶𝗴𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗹𝗶𝗻𝗸𝗶𝗻𝗴 🔗 It resolves what references actually point to, then links entities, citations, and cross-references into a single coherent graph. • 𝗚𝗿𝗮𝗽𝗵-𝗳𝗶𝗿𝘀𝘁 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 🏃‍➡️ Small enough to run locally on a consumer PC with sub-second latency, and it stays reliable on long documents where front To read more about our new model, check out our latest Hugging Face article: https://huggingface.co/blog/isaacus/introducing-kanon-2-enricher

reacted to vincentg64's post with 🔥 over 1 year ago

No-Code LLM Fine-Tuning and Debugging in Real Time: Case Study Full doc at https://mltblog.com/47DisG5 Have you tried the xLLM web API? It allows you to fine-tune and debug an agentic multi-LLM in real time. The input data is part of the anonymized corporate corpus of a Fortune 100 company, dealing with AI policies, documentation, integration, best practices, references, onboarding, and so on. It features one sub-LLM. The full corpus is broken down into 15 sub-LLMs. One of the goals is to return concise but exhaustive results, using acronyms (a specific table for each sub-LLM) to map multi-tokens found in prompts but not in the corpus, with multi-tokens in the corpus. Exhaustivity is the most overlooked metric when evaluating LLMs designed for search / retrieval. Using xLLM in combination with another LLMs is one of the best approaches, and both can be used to evaluate each other. Yet, thanks to fast in-memory processing, no weight, and no training, the xLLM web API is one of its kind, with capabilities not found in any competing product, free or not. Read more at https://mltblog.com/47DisG5

View all activity

Organizations

models 0

None public yet

datasets 0

None public yet