Tesla Robotaxi and the Future of Autonomy
Summary
Tesla unveiled their vision for the future of transportation at the “We, Robot” event, showcasing the Robotaxi and Robovan. The Robotaxi is a sub-$30,000 autonomous vehicle designed for personal use and potentially part of a rideshare network. Tesla’s Full Self Driving (FSD) technology, powered by custom AI hardware and software, is rapidly improving and expanding to more vehicles. While Tesla has a history of overpromising on timelines, recent progress suggests that widespread autonomous driving may be achievable within the next few years.
What’s Important
- Tesla introduced Robotaxi, a sub-$30,000 fully autonomous vehicle with no steering wheel or pedals, designed for personal use and potential ride sharing.
- Tesla’s FSD technology is improving rapidly, with consistent updates and expanding capabilities for existing Tesla vehicles.
- Custom AI hardware (HW3 and HW4) and end-to-end trained models power Tesla’s autonomous driving capabilities. Robotaxi and Modle Y demos at the “We, Robot” event were running an early build of FSD 13 on HW4.
- Tesla’s vision-only approach (Tesla Vision) differs from competitors using lidar and radar, but shows promise based on real-world performance.
- While Tesla has missed previous deadlines, recent progress suggests autonomous driving technology is closer to widespread adoption.
- Tesla’s in-car experience and user interface are considered industry-leading, enhancing the overall autonomous driving experience.
Last thursday, Tesla had their “We, Robot” event. While it was technically a product launch, with a showcase of Robotaxi and Robovan, it was more a showcase of Tesla’s vision of the future. Futuristic looking cars, public spaces void of parking lots and turned into public parks and green space, cheap and safe transportation for all.
Robotaxi
Before we get into the technical details, feasibility, and reality of it, Robotaxi! Tesla Robotaxi is designed to be a sub-$30,000 car with no steering wheel or pedals. It’s controlled entirely digitally and autonomously, with no manual controls outside of manual releases for the doors and windows in case of emergency. There is no physical charge port, instead it charges wirelessly with induction charging. The Robotaxi will automatically align with the induction coil and start charging and should achieve around 200 miles of range.
Robotaxi will be a consumer product, available to buy for your own personal use. They didn’t give range estimates, logistics, or show off much. Simply a demo of the vehicle, mention of AI5 hardware (this references both inference silicon and camera array), and some brief notes on supply chain.
The Robotaxi and Robovan will share parts with the current Model S, 3, X, Y, and Cybertruck in order to keep costs down and scale production. This includes the aforementioned AI5 hardware and actual parts in the vehicle. The side panels are the same as the 2024 Model 3, the buttons to open the doors are taken from the Cybertruck, the hinges and panels Model 3/Y. With all of this in mind, I have no doubt they’ll hit the sub-$30,000 price tag mentioned on stage, especially with the rumored 40kWh (36kWh usable) battery and 200 mile range, and target of 5.5 mile per kWh.
The reason for the 2-seat vehicle, well, it’s that 82% of miles traveled in the US are with 2 or less passengers, according to the principal engineer on the Robotaxi. This makes sense when designing a mass market, high volume, autonomous car. You need to get the most bang for your buck, and this seems like it. This allows them to lower cost per ride, cost per mile, and have more cars available on the road. It makes sense.
Ride-share vs. Consumer Vehicle
One thing I want to make clear from this event is that Tesla did not specifically announce a car share network or competitor to Waymo/Uber. While they’ve shown teasers and demos in the past and referenced it with photos and videos, they did not specifically mention it. It’s happening, Tesla has said it’s happening, they partly showed it, but did not mention it. The Robotaxi network would be a Waymo competitor, the Robotaxi itself would replace human drivers for consumer cars outside of a rideshare network.
In some of the slides where they showed Robotaxi, it specifically says “Most Affordable,” “Cheapest to Operate”, and “Always Your Car.” It’s just weird messaging, and takes context of previous Tesla earnings calls, AI days, investor decks, and Elon tweets to really put it together.
From my understanding, there will be two pools of Robotaxi’s (for simplicity’s sake, I’ll be calling the vehicle Cybertaxi and network of autonomous vehicle robotaxi from here on out): Tesla owned and consumer owned. These will function in an autonomous ride share service, similar to Waymo. The Cybertaxi can be purchased and managed as a fleet designed for the ride share network of Robotaxi’s, or you can have it join the network while it’s not in use by the owner or even never have it join the network and function as exclusively your personal driver. All Model S, 3, X, Y, and Cybertruck seem to be capable of joining this autonomous rideshare network and function as a Robotaxi, with a specific focus on Model 3 and Model Y first. Again, this seems to be a choice. If you own a Tesla, you have the option to share it as part of the network, but you do not have to. I like this option and choice. You could, in theory, have the car pay for itself while you’re not using it. That’s an insane concept.
The other thing is number of passengers. A Cybertaxi is a two-seat vehicle, but that doesn’t mean you are limited to a robotaxi being a cybertaxi. You could, in theory, call a 2, 5, or 7 seat vehicle depending on what you need. The Model S, 3, X, Y, and Cybertruck all come with 5 seats standard, and Model X and Y have 7 seat options. There are plenty of options taking advantage of Tesla’s existing and future vehicles.
Tesla AI Hardware
Now, to the AI stuff. What makes this all possible is a new Unsupervised Full Self Driving model, which should be available in Texas and California next year, and expand as regulators approve it elsewhere. This will hit current Tesla with HW3 and HW4, so essentially any Tesla Model S and X since 2019 or any Model 3 since 2017. Any Tesla since about 2016 with AP2/AP2.5 will be eligible for it with a retrofit to HW3. This makes around 6 million cars already on the road eligible with either a just software update or hardware and software update.
All of this is in theory, of course. Tesla’s supervised FSD model isn’t perfect and continuously getting changed and improved. It’s an end-to-end trained stack with video feed from the vehicles 7-8 cameras and vehicle input as the training data, all from the aforementioned 6 million cars on the road. The final model is then video in from all the cameras, vehicle control out. This all runs on custom silicon designed by the Tesla AI team specifically for autonomous inference. Once upon a time (until 2019 on Model S & X) Tesla used Nvidia DRIVE PX 2, but switched to in-house silicon as their models became more advanced.
What I find interesting about the hardware, don’t worry I’ll get to the AI part in a second, is how it all works towards the end goal. While Tesla isn’t super open about how it all works, the general understanding is there are two computers in each car: FSD Computer and System computer. The system computer powers the infotainment system, car keys, bluetooth, media, nav, etc. It’s powered by an AMD Ryzen APU in new cars, Intel in older ones. Model 3, Y, and Cybertruck have 8GB of RAM, S and X have 16GB. I tend to think separating these is a good idea, if one system goes down, you don’t lose access to the core vehicle system.
The Tesla FSD computer is a custom chip, HW3 on Samsung’s 14nm US foundry and HW4 likely on TSMC’s 7nm node, though rumored to be 4nm class node. These computers have their own ARM based CPU, rumored to be based on Samsung’s Exynos IP, as well as multiple AI accelerators for redundancy. HW3 supposedly has 2 while HW4 has 3. This means if one of the accelerators fails, it can swap to the other in real time. This also means running multiple models at once, in theory. These neural accelerators do use a slightly different architecture, but Tesla was able to emulate HW4 hardware accelerators on the HW3 kernel for feature parity. This is all pieced together from different sources, conversations, and my understanding of it and could be off, but my best understanding based on what is known.
Full Self Driving (Today)
Now back to the Full Self Driving AI model, this stack is currently a single model across all hardware versions. The stack is made up of a few different models designed to power a few different features. Right now there is a model for roads, actually summon, auto-park, and highway. All of these are end to end models trained on real world driving data, as well as synthetic data from the Tesla AI team using Unreal Engine for edge cases.
These are all using exclusively Tesla Vision, the name of the vision-only technology Tesla uses. No radar, lidar, or ultrasonic sensors. I’m not going to debate the accuracy of camera only vs. radar/lidar/ultrasonic redundancy. Tesla believes they can do it, I trust them. This is an ideological difference between Waymo, Cruise, and others and we’ll see what plays out! I think Tesla Vision can work just as well as the others, especially given the data they have and the rate of improvement on Full Self Driving supervised.
Speaking of full self driving supervised, I have it on my 2024 Telsa Model 3 Long Range. It’s powered by AI4 hardware. Currently, I’m running the FSD 12.5.4.1 model. This has 3 E2E (end to end) models and one CNN (convolutional neural network) based model. Actually Smart Summon, normal roads, and auto park are all E2E models and these are great. As of October 11, 2024, my car has driven 65% of my miles over the past 14 days. That’s 321.3 miles out of 492 total. It drove me 131 miles round trip on both highway and road yesterday, and I didn’t need to touch the steering wheel once.
Smart summon makes mistakes here and there, but these mistakes are turning left rather than right in a parking lot and it takes a little extra time to get to me. Not a big deal. Auto park has been flawless since I got access to it, even to the point where it parallel parked in a spot in New York City that I would not have been comfortably able to do myself. Normal roads, well, it’s getting better with each update. During the day, it’s amazing. It goes at a safe and comfortable speed, largely avoids curbing itself (which was a problem on the 12.3.6 model), and all around feels smooth. At night, it does have issues with phantom breaking. This is where it thinks there’s something in the road and just stops. Generally, this happens when the main camera is blinded by an on-coming SUV. Phantom breaking never happens when there are cars behind me, and generally is more of an annoyance than a real problem.
These E2E models aren’t perfect, but there’s been an average cadence of an improved model every 8 or so days, with new models bringing new features every few weeks. Currently, FSD 12.5.6 is in testing with Tesla’s early access group and makes the highway stack E2E rather than a CNN, which makes highway driving more natural and safer. They’re working on adding support to the road stack for reversing, parking, and auto parking as well. This would mean the model could fully go from point A to point B without needing any extra human control, like selecting a parking spot or pulling out and getting to the street which is required now.
Again, Supervised FSD as it exists on my car today, and anyone with a Tesla from ~2017 to today, is not perfect. It has issues, it does things that would generally be considered rude or unsafe, and can make mistakes. The thing is, every time it gets an update those occurrences become more rare. I also believe Tesla is working on a single trained model that combines the 4 individual stacks, which they claim to be working on, before fixing individual bugs like phantom breaking. I believe it’s not worth fixing a bug that could be reintroduced when changing the architecture when you can finalize the architecture then fix the bug.
My thesis is that as Tesla builds out the compute at their Texas Gigafactory, the 100K H100 data center and in house Dojo supercomputer with System on Wafer processors, the models will improve exponentially. We’re seeing that already with the rate of improvement of the models on these cars. More compute means more memory, larger data sets, and faster training for improvements.
The Reality of Autonomy and my take
All of this sounds great, and is realistic if you follow it closely, but Elon Musk has also made promises like this before. All of what I was talking about was supposed to be around years ago. The E2E highway stack was supposed to launch 2 years ago, Robotaxi’s in 2020, and more. He’s made promises on Cybertruck pricing and were off by around $20,000, though to his defense a lot unpredictably changed between 2019 and 2023 when deliveries start. The Roadster was supposed to launch years ago, after taking full payments, and still hasn’t.
I think the difference between then and now is we can now see how these new announcements are done. Using the current supply chain, where Tesla is able to make $35,000 cars and stipping away parts, seats, and making it smaller, I have no doubt in sub $20,000. With FSD, years ago this was all a pipedream but now we’re seeing consistent improvement that’s hitting actual cars! The model is 80% there. Within a year, I could see it getting 95-99% of the way there. The years of promises were way off in terms of timeline, far too optimistic as Elon said himself at the We, Robot event, but now feels like it’s actually happening. As one Twitter user said, the vibes were right, and I tend to agree.
I believe we are finally at a point where the silicon is there, software architecture is there, vehicles are there, and data is there. Lots of companies are building out silicon and hardware stacks for autonomy. Nvidia has their Drive platform, which includes cameras, radar, silicon, and software SDK. Qualcomm has a whole suite under Snapdragon Automotive with ADAS, advanced driver assistance system, and connected cockpit. Mediatek has Dimensity Auto Cockpit with RTX graphics to accelerate infotainment systems.
The problem I see with this is data and architecture, Tesla seems to be the only one with it, and they are willing to license out the technology. This doesn’t mean these other companies have no place, as I explained earlier Tesla has a dedicated FSD computer and system computer, it might be worth improving the system computer and working with Tesla on FSD connectivity and support.
Mercedes’ is working with Nvidia on their ADAS system, and it’s apparently great. BMW is working with Qualcomm and AWS, and it’s apparently great as well. Ford and GM have in house ADAS systems as well. I can’t comment on these, as I haven’t tried them quite yet. Other automotive companies like Volvo, Polestar, Rivian, and Xiaomi are also working Nvidia on autonomy stacks and ADAS, but not as advanced quite yet as Mercedes’, BMW, or Tesla.
The interesting parts of these systems is the autonomy levels. The way it’s generally understood is Level 0 is no form of assistance. Level 1 is automatic features like steering or breaking assistance. Level 2 is ADAS where the vehicle can control acceleration, breaking, steering, and parking but still needs driver supervision to take over without alerts. Level 3 is a more advanced ADAS, where the system will alert the driver when they need to take over, but the driver can be doing other things and not paying attention. Level 4 autonomy is fully autonomous but with geofencing, like Waymo. Level 5 is fully autonomous, no geofencing.
Currently, Mercedes is the only manufacturer in the US with a Level 3 system. Tesla FSD is still technically Level 2. The interesting parts of these rankings is even if Tesla’s Level 2 system outperforms Mercedes Level 3 or Waymo Level 4, it’s still a Level 2 system because of the requirement for unprompted manual takeover and driver awareness.
I believe the reason for Tesla continuing Level 2 is because of how the E2E model works. From my understanding of modern AI models, in order for Tesla’s E2E model to be able to alert the driver that they need to take over, it would an architectural change for the model to be able to understand it’s limitation, which then could cause it to force the driver to take control when the driver really wouldn’t have to. This seems like a deliberate choice, and focusing on producing a model so good that no human intervention is required seems like the logical next step.
Even beyond pure autonomy, in terms of experience Tesla is simply the best. I feel safe saying there is not a single in-car experience that is as good. Type in a location, press a button, the stack brings you there. The maps are great, audio and music playback is great. It’s simple, easy, and user friendly. It’s a great experience. That is not true for anyone else, even if the autonomy stack is equal.
Autonomy is coming within this decade, and AI is about to become a lot more interesting and useful than LLMs. The age of consumer robotics is coming, and it’s coming fast.