Code for training and evaluating TDANN language models. See the TopoLM paper.
- model specification / training code is in
models
- model evaluation scripts are in
eval
- fMRI data + analysis scripts are in
fMRI
figures
stores plots generated by eval scripts
you can install all python dependencies with pip3 install -r requirements.txt
- torch v2.x (for flash attn + model compilation)
- huggingface datasets (for fineweb / owt), tiktoken (for bpe)
- omegaconf, wandb, tqdm
- matplotlib, seaborn
- esda, libpysal (for moran's I computation)
- pandas, numpy v2.x, scipy
you can access the models we trained / analyzed in the paper here.