AbstractPhila PRO
AI & ML interests
Recent Activity
Organizations
Geometric Fusion: Cross-Modal Alignment Through Shared Pentachoron Geometry
AbstractPhil/geolip-procrustes
I encourage EVERYONE who is curious to check my work. Check it, double check it, and triple check it.
These were aligned using COCO and then validated with Flickr. Entirely different datasets. The experts arbitrated and the alignment yielded the correct answers. Preliminary tests show that with almost no alignment requirement, the models can reach 100% R1 retrieval accuracy.
Not to be confused with validation accuracy for a classification model or a text encoder's text response, this allows multispectral communication between entirely different models for direct downstream consumption with almost no training for the chosen models.
I have a working procrustes experiment that learns adjacent manifolds within a reasonable spectrum and the speed is... well, 1 epoch with COCO using Bert-Large and DinoV2 that allows the models to align nearly perfectly. For some scales in the experiment it shows that the 3 set epochs aren't quite enough to align R1 to highest, while many align nearly immediately.
These two were an obvious pair to pick, 60% similarity and >90% spectral similarity.
The trainer transfers layers, learns embeddings, and more - all by sticking strictly to geometric boundaries and procrustes informational accumulation within a modulation model's constraints.
I have many experiments to run.
After a very long set of days, with multiple setbacks, I have found a potential direction using a type of modulation attention I haven't named yet, in direct association with transformer structural boundaries.
This attention is essentially based on a form of geometric modulation and gated based on differentiation. This is likely one of the building blocks for a replacement to a hard-trained set of weights - instead formatted into one of the first legitimate safety-nets built specifically for geometric attenuation.
Experiments show a multitude of potential limitations. Those potentials are destroying certain objectives and combining others into new processes, rather than letting the original design sit in concrete. Everything must conform to the math, not the math conform to the everything in this structure.
The entire concept here is narrowing down the problem into a regressed solution that makes the most complementary sense to the least potential requirement of hardware in order to achieve the necessary goals.
https://huggingface.co/AbstractPhil/procrustes-analysis
You can find my current task-oriented experimentation stored here. As I deconstruct the models into their subsequent boundaries I accumulate a manifest of information and data. This is entirely meant to build that very same geometric structural awareness that models require to be stable.
I've discovered multiple very tight bottleneck points that uniform among models with the multitude of analysis I've ran. There are some that likely form based on the law of averages, and there are others that form... well, they are mostly the same among all models - but they are not the same for every model so I can refer to those as semi-constant. I've found some constant spaces, and some constant point of ranges, but I need to test more models and I need to test larger models.
I must sincerely apologize for not solving this problem quickly.
This will take time. Without the approximator it's going to be considerably slower, but this model I begin training will be providing the approximations in a different way over time. As iterations progress, the system will conform to a huge array of geometric potentials and be capable at predicting those, but it will not be as powerful as the full patchmaker up front, and it will be slow training.
If I can get my hands on a cluster of A100's or H100's for a measure I'll make a post immediately, until then I must default to the slower process.
I really banked that the smaller version would have worked, but it simply couldn't hold complex topological shape without the correct boundaries being learnable AND endure entropic decay simultaneously. The only way to have a predominant shot at a full geometric shared language, is to make those boundaries learnable in the full spectrum of potentials, or at least more than I have placed on it.
I'll be refining my process in the coming days further, and I do apologize for pre-emptively announcing a potential that I have yet to fully explore.
There will be a full upgraded 38 shape geolip patchwork trained asap to fully encompass the Flux 1 AE spectrum, and another trained for SD15, SDXL, and Flux 2's VAE as well. These will accommodate DIRECT complex geometric patchwork learning, but not to the scale as promised yet. Autoregression is a complex mistress as many of you know, and I will be spending a great deal of time and compute analyzing all of the information required to build a uniformly useful and powerful autoregression patchwork to utilize as invariance to teaching.
The small model did not breach a certain level of accuracy as required by my specifications, so I've defaulted to harvesting information from AI models until I get the comparative bounds required for a useful topology.
This will take time. Without the approximator it's going to be considerably slower, but this model I begin training will be providing the approximations in a different way over time. As iterations progress, the system will conform to a huge array of geometric potentials and be capable at predicting those, but it will not be as powerful as the full patchmaker up front, and it will be slow training.
If I can get my hands on a cluster of A100's or H100's for a measure I'll make a post immediately, until then I must default to the slower process.
I really banked that the smaller version would have worked, but it simply couldn't hold complex topological shape without the correct boundaries being learnable AND endure entropic decay simultaneously. The only way to have a predominant shot at a full geometric shared language, is to make those boundaries learnable in the full spectrum of potentials, or at least more than I have placed on it.
I'll be refining my process in the coming days further, and I do apologize for pre-emptively announcing a potential that I have yet to fully explore.
It's too small to just finetune something with ablation, it'll likely lose a huge percentage of it's behavior and become highly unstable in unseen ways.
Not to mention it's multimodal, accepting images AND videos for processing... There's no telling what sort of damage shared space will have when trained with ablation reinforcement without providing adjacent behavioral supplementation to it.
QWEN 3.5 Residual Thinking Embeddings: How Language Models Transform Text Through Deliberative Generation
0.8B and I are going to be good friends.
I've managed to condense a prototype to substantially smaller size but it's not as accurate as the original due to the generic topology being more challenging. I'm working it out though.
I've figured many new formulas based on the results of the last, which enable more deterministic projection rather than requiring the learning process to be so dispersed among many different subsystems.
I've also managed to form a 5d deterministic projection scaffold that should enable the entire structure to be even smaller, assuming I can work out the edge cases.
It's considerably cheaper than expected to keep volume valid. This seems like a partial regression for now but I can improve it a bit before heading back in the original direction. Hopefully it's worth the time spent on the potentially improved more sleek structure.
The smaller one can handle more shapes, considerably more shapes per scene, at a much higher complexity than voxel association. This has drawbacks though, namely these are essentially a gate set for now and the gates aren't perfect. These CAN find the correct potential, however the subprocessing isn't enabled yet, meaning our little 400k param set here is powerful but in a different kind of way.
I've started making pushes to include the missing pieces, so the colab will start to comply to the training regime and the geovocab2 will no longer be required.
The majority of the geovocab2 specific formulas and factories used will be directly represented in the vocabulary directory, which will be optimized to a better state than the originals. They will include both numpy and torch synthesis, as well as numpy and torch optimizations for worker creation and transforms.
With this I will include the more robust shape factory from the original, and expand it to include deformation perturbation. This will be a learned behavior of the model, which will allow the deformation of shapes to be directly aligned and trained in bulk along with multiple overlapping shapes, multiple sectorized shapes, sub-shapes, deviant shapes, and everything related directly to shape pooling rather than using hard-set spectra of shapes projected into space.
These patches will essentially be alignment sectorization in their first states for the first 8piece prototype of the chunk, as I can train that on the currently available G4 issued by COLAB.
This is a required element for increasing the learner to full definition capacity, and is a required hurdle before the patchwork can be expanded to a full chunk. The experiments are promising leading to this point, and as I snap pieces together from the successful experiments the system will begin to converge exactly where the expectation rests.
After that, it's just a matter of expanding upward to the necessary architecture and introducing the weights in sequential linear interpolative sequencing, which is something transformers are uniquely capable at handling with minimal calculations after the pre-calculations.
So far so good.
I'll be running multiple alucard fusion ablations on the patchwork before defaulting to the dual-stream slit-light superposition crystal topology architecture that I've proven works for the smaller patchmaker. My hope is that I can approximate the behavior in a more concise way without requiring the full spread of geometric globalization, but there's no guarantees yet. This could save a huge chunk of training time if it works, and alucard's scheduling internal step system will have a place. This may cut a huge percentage of the overall followup training, potentially allowing for the training on less machines. The topology architecture may be fully required, so hopefully I can just avoid all through some clever math and be done with it.
Avoiding the full multi-tower Beatrix oscillation system would be absolutely fantastic, but I think the predictions afforded by the system may be fully required, and the oscillation system will likely need to be tuned into a new form for this use case as well.
I do apologize for the nasty code, but Claude tends to be very difficult to make cooperate if you drive the code too far from Claude's context window. Much of my organization has helped but not enough, but Claude DOES afford rapid prototype capacity. The current repo itself houses a mostly incomplete representation of the outcome, but I want to make sure at least SOME of the formulas align before I start pushing further iterations.
Fair organization can be found in the router section of the geofractal router, the hierarchy spectrum of the geovocab, and the entire system of the pytorch-wide-compiler. They are ugly though and evolved in their own way, I just let Claude work sometimes because otherwise it would take 4x as long to organize in a reusable fashion.
MOST of the code compiles, but I believe there's some .item() edge cases in the current code that causes graph breaks. I'm working on it.
I'll HOPEFULLY be pushing a fairly organized update to the geolip repo this afternoon with a more complete interpretation of the subsystems, but the formulas aren't perfect yet. I have a couple prototype patchmakers in training but they have some bugs. I'll try to keep them organized.
I need to clean up this sewer honestly, the code got nasty. It's more often fast than not. Might be worth porting all classes directly to the geolip repo, which will centralize for AI development rather than have everything out in divergent systems.
In the gaming industry we call this "YOUR PRODUCER IS CONFUSED AND MAD BECAUSE TECH DEBT"