DeepSeek OCR Triton Error [CUDA] an illegal memory access on vLLM 0.11.2
#102
by
blazarev
- opened
For certain images only (as far as I observed if mixed with handwritten and digital letters in the same image, but not on all...), I get thrown an error for illegal memory access. Sometimes, for the same image I get thrown RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx
The server crashes for every request on the same image, but works fine for other images.
When I send the same image on a local HF version of the model, the image is processed
I am using an OpenAI call to the deployed model on vllm.