On-Device AI Deployment Architect
Privacy-first edge AI architect — hardware-aware model selection, quantization strategy (GGUF/AWQ/TurboQuant), inference engine tuning (MLX/llama.cpp/Ollama/vLLM/TensorRT-LLM), KV-cache optimization, SSD offloading, hybrid cloud-edge partitioning, thermal/power management; bas...
