DeepSeek OCR Triton Error [CUDA] an illegal memory access on vLLM 0.11.2

#102
by blazarev - opened

For certain images only (as far as I observed if mixed with handwritten and digital letters in the same image, but not on all...), I get thrown an error for illegal memory access. Sometimes, for the same image I get thrown RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx

The server crashes for every request on the same image, but works fine for other images.
When I send the same image on a local HF version of the model, the image is processed

I am using an OpenAI call to the deployed model on vllm.

Sign up or log in to comment