Boson AI is looking for an experienced Network Engineer to design and optimize high-performance networking infrastructure for AI/ML operations. The role involves managing advanced networking technologies and ensuring peak performance for complex systems.
About The Role We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, and hundreds of servers. You'll be hands-on with the full lifecycle of our network infrastructure: planning, building, testing, deploying, and keeping everything running at peak performance. That means troubleshooting issues as they arise, monitoring network performance and throughput, developing automation to streamline operations, and working closely with HPC and ML teams to ensure they have the bandwidth they need. You'll also help us plan for future capacity and evaluate emerging network technologies as we scale to meet increasingly demanding workloads. Responsibilities • Configure and maintain InfiniBand and high-speed Ethernet fabrics • Optimize network performance for RDMA, and GPU-to-GPU communication • Manage network switches (Mellanox, NVIDIA, Micas Networks) • Troubleshoot network bottlenecks and latency issues • Plan and execute network upgrades and expansions • Network security implementation (firewalls, VLANs, ACLs) • Collaborate on storage network optimizationInfrastructure monitoring Minimum Qualifications • 4+ years of network engineering experience in production environments • Strong understanding of L2/L3 networking protocols (TCP/IP, BGP, OSPF, VLANs) • Hands-on experience with high-speed networking (100Gb+ Ethernet and InfiniBand) • Hands-on experience with network security (firewalls, ACLs, network segmentation) • Knowledge of HPC network topologies • Experience with InfiniBand fabrics including RDMA, RoCE, IPoIB • Strong troubleshooting and problem-solving skills Preferred Qualifications • Experience in data center environments or AI/ML infrastructure • Hands-on experience with high-performance Ethernet switches (e.g., Broadcom Tomahawk), and latest InfiniBand switches (e.g., Nvidia/Mellanox) • Experience optimizing networks for GPU-to-GPU communication • Experience with open-source firewall solutions (OPNsense, pfSense, or similar) • Experience with network automation tools • Understanding of distributed storage networking (Ceph cluster networks) • Familiarity with network monitoring and observability tools (Prometheus, Grafana) • Knowledge of multi-site network connectivity and WAN optimization • Familiarity with cloud networking in at least one platform (AWS, GCP, or Azure) including VPC design, site-to-site VPN configuration, Direct Connect/ExpressRoute/Cloud Interconnect, hybrid cloud connectivity, and cloud-to-datacenter network integration If you're a natural problem-solver with a passion for continuous learning, we'd love to hear from you. We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
HTC Global Services is seeking a hands-on SDET with expertise in AI/ML testing to develop automated frameworks and ensure model accuracy and reliability. This full-time position is based in Orlando, Florida.
Boson AI is looking for an experienced Network Engineer to design and optimize high-performance networking infrastructure for AI/ML operations. The role involves managing advanced networking technologies and ensuring peak performance for complex systems.
Onebridge is seeking an experienced ML Infrastructure Engineer specializing in Azure AI to design and optimize cloud-based machine learning infrastructure. This hybrid role requires collaboration with data scientists and business stakeholders to operationalize AI models in Indianapolis, Indiana.
Spring Health is seeking an Engineering Manager to lead the AI & ML Infrastructure team in Seattle, Washington. This role focuses on building and scaling AI/ML platforms to enhance mental healthcare delivery.
Management Applications, Inc. is seeking contract positions for Network Infrastructure Support and Maintenance for the PA Department of Human Services in Harrisburg, Pennsylvania. Candidates will be responsible for server, network, and IVR support, along with managing user accounts and providing desktop support.
Onebridge is seeking an ML Infrastructure Engineer (Azure AI) to design and optimize cloud-based machine learning infrastructure. This hybrid role requires collaboration with data scientists and business stakeholders in Indianapolis, IN.
HTC Global Services is seeking a hands-on SDET with expertise in AI/ML testing to develop automated frameworks and ensure model accuracy and reliability. This full-time position is based in Orlando, Florida.
Boson AI is looking for an experienced Network Engineer to design and optimize high-performance networking infrastructure for AI/ML operations. The role involves managing advanced networking technologies and ensuring peak performance for complex systems.
Onebridge is seeking an experienced ML Infrastructure Engineer specializing in Azure AI to design and optimize cloud-based machine learning infrastructure. This hybrid role requires collaboration with data scientists and business stakeholders to operationalize AI models in Indianapolis, Indiana.
Spring Health is seeking an Engineering Manager to lead the AI & ML Infrastructure team in Seattle, Washington. This role focuses on building and scaling AI/ML platforms to enhance mental healthcare delivery.
Management Applications, Inc. is seeking contract positions for Network Infrastructure Support and Maintenance for the PA Department of Human Services in Harrisburg, Pennsylvania. Candidates will be responsible for server, network, and IVR support, along with managing user accounts and providing desktop support.
Onebridge is seeking an ML Infrastructure Engineer (Azure AI) to design and optimize cloud-based machine learning infrastructure. This hybrid role requires collaboration with data scientists and business stakeholders in Indianapolis, IN.
HTC Global Services is seeking a hands-on SDET with expertise in AI/ML testing to develop automated frameworks and ensure model accuracy and reliability. This full-time position is based in Orlando, Florida.
Boson AI is looking for an experienced Network Engineer to design and optimize high-performance networking infrastructure for AI/ML operations. The role involves managing advanced networking technologies and ensuring peak performance for complex systems.
Boson AI is looking for an experienced Network Engineer to design and optimize high-performance networking infrastructure for AI/ML operations. The role involves managing advanced networking technologies and ensuring peak performance for complex systems.