Build with Logix.
API-first model compression and deployment. From ingestion to edge delivery in a single pipeline. Python SDK, Swift runtime, CLI tools.
API Reference
Deploy quantized models to any target with a single function call.
Deploy a Model
pythonDeploy a quantized model to any target device with one call.
1import logix23# Initialize the Logix client4client = logix.Client(api_key="lx-sk-...")56# Deploy a quantized medical model to iPhone7deployment = client.deploy(8 model_id="logix-phi4-medical-v3",9 target="iphone_16_pro",10 precision="int4",11 runtime="coreml",12 config={13 "max_context_length": 4096,14 "batch_size": 1,15 "thermal_profile": "sustained"16 }17)1819print(f"Deployed: {deployment.id}")20print(f"Artifact size: {deployment.artifact_size_mb}MB")21print(f"Estimated latency: {deployment.est_latency_ms}ms")Run Inference
swiftRun on-device inference with the deployed model.
1// Swift — On-Device Inference (CoreML)2import LogixRuntime34let model = try LogixModel(5 artifact: "logix-phi4-medical-v3.mlpackage",6 precision: .int47)89let result = try await model.generate(10 prompt: "Summarize the patient's cardiac markers",11 context: patientRecord.deidentified(),12 config: .init(13 maxTokens: 256,14 temperature: 0.1, // Low temperature for clinical precision15 topP: 0.916 )17)1819print(result.text)20// Latency: 23ms | Tokens/sec: 42 | Memory: 1.2GBOne CLI.
Full pipeline.
The Logix CLI handles the entire model lifecycle — from ingestion and distillation to pruning, quantization, benchmarking, and deployment. Install it and own the pipeline.
$ pip install logix-cli
Initialize a new Logix project with default config
logix init my-medical-slmIngest a base model and training dataset
logix ingest --model meta-llama/Llama-3.2-1B --dataset ./clinical_notes/Run recursive distillation with GPT-5 as teacher
logix distill --teacher gpt-5 --student ./student_1B/ --cycles 47Quantize the refined model to INT4 with AWQ
logix quantize --precision int4 --method awq --calibration ./cal_set/Apply structured pruning at 30% sparsity
logix prune --sparsity 0.3 --structured --preserve-heads 0-7Run comprehensive benchmark suite on target hardware
logix benchmark --suite mmlu,hellaswag,medqa --device m4-proPackage and deploy to NVIDIA Orin via OTA
logix deploy --target nvidia_orin --runtime tensorrt --package ota