Instructions to use Salesforce/SFR-Embedding-Code-2B_R with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Salesforce/SFR-Embedding-Code-2B_R with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Salesforce/SFR-Embedding-Code-2B_R", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use Salesforce/SFR-Embedding-Code-2B_R with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Salesforce/SFR-Embedding-Code-2B_R", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Salesforce/SFR-Embedding-Code-2B_R", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Convert git-lfs md, py, json files to normal git files
It took me a minute, but I managed to get the changes regarding git LFS working. See #7 for details.
A problem that I ran into is that if I also try to store tokenizer.json as full, then Hugging Face complains: it wants 10MB+ files as git-lfs. I suspect that is why you originally started using git-lfs for more files.
To prepare this PR, I did the following:
- Remove the last 4 lines from .gitattributes that state that *.md, *.json, *.py, and *.DS_Store should be stored as LFS.
- Afterwards, call
git add --renormalize *.json,git add --renormalize *.py,git add --renormalize *.md.
Initially, I also turned tokenizer.json into a normal git file, and it got stuck uploading (which is why this PR has been open for 3 hours before the first commit). I ran git config --global http.postBuffer 524288000 to resolve it. Afterwards, I got this error:
remote: -------------------------------------------------------------------------
remote: Your push was rejected because it contains files larger than 10 MiB.
remote: Please use https://git-lfs.github.com/ to store large files.
remote: See also: https://hf.co/docs/hub/repositories-getting-started#terminal
remote:
remote: Offending files:
remote: - tokenizer.json (ref: refs/pr/8)
remote: -------------------------------------------------------------------------
So I re-included tokenizer.json as git-lfs, and then I could make this PR like normal.
So: This PR only changes how files are stored, it does not change the content of any file! Once this is merged, then I can update #7 so that you can inspect those changes more easily.
- Tom Aarsen