DeepSeek OCR Triton Error [CUDA] an illegal memory access on vLLM 0.11.2

#102

by blazarev - opened 28 days ago

28 days ago

For certain images only (as far as I observed if mixed with handwritten and digital letters in the same image, but not on all...), I get thrown an error for illegal memory access. Sometimes, for the same image I get thrown RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx

The server crashes for every request on the same image, but works fine for other images.
When I send the same image on a local HF version of the model, the image is processed

I am using an OpenAI call to the deployed model on vllm.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment