AI Rebels
The AI Rebels Podcast is dedicated to exploring and documenting the grassroots of the current AI revolution. Every week a new episode is posted wherein the hosts interview entrepreneurs and developers working on the cutting edge. Tune in to benefit from their insight.
AI Rebels
Humanoid Robots Are a Distraction (Here's What Actually Works) ft. Grigorij Dudnik
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
What happens when AI leaves the screen and enters the physical world? It breaks.
Spencer and Jacob sit down with robotics researcher Grigorij Dudnik, who's been running real experiments with real robots—and finding that most of our assumptions about AI fall apart the moment hardware gets involved. The big one: the idea that a single massive model can do everything. Grigorij makes the case that the future isn't a "super-intelligent" humanoid. It's a modular system where an LLM plans, specialized models act, and physical constraints keep everything honest.
They get into why humanoids are overrated, why generalization keeps failing in robotics, and why a system built on narrow, composable skills might be the actual breakthrough everyone's overlooking.
hello everyone and welcome to another episode of the AI Rebels Podcast as always I am your co host Spencer and I'm your other co host Jacob and we are here with Gregory Dudnik from Poland calling in from Poland we're very excited to have you on Gregory as as a background um before we start just like blasting you with questions I wanted to give a quick quick background on you Gregory for our listeners because we've had you on before and if people listen to that I'm sure they remember that episode because you you've had a unique background where you've created agenic tools like Clean Coder and a favorite tinder GPT um but now you've moved to robotics and more embodied intelligence which is what we're very excited to have you on today so welcome to the show again Gregory thanks for coming on how have you been yes thank you thank you Jacob thank you Spencer for inviting me here uh yes that story behind of that my switch from the just normal agenic AI to embodied AI uh it's last like happened a few months ago I think then around half a year ago when I decided just that just normal AI that on other side of the monitor AI just became too boring everyone doing it like it's nothing new here it's all that some news it's OK we have like another model it's not GPT 5.1 GPT 5.2 yeah it's amazing yeah it's so cool so like I also see like for people it's became boring and yes for me like like I also like for all like for a very long time ago even before I like started to do AI even before I started to do coding I've been doing some mechanical things electrical things so I have that like that hardware hobby inside of me yes and I very like to connect the passions of that hardware passion and AI passion into one to have that embodied AI and I think it's like good moment to do it to go into the embedded AI now because uh today for embedded AI it's like year 221 for for for generative AI yes like it's all looks like it will be a new revolution very soon yeah and we have more and more robots not only humanoid robots easy yes so I can't like that's the topic I very laugh and that's that's why I'm doing it haha yeah something that's really exciting about the the current moment in AI is that um not only are are you know there's there's LLMs but also computer vision models are are uh you know matching their progress right like you have uh Sam 3D that was released a few weeks ago and and you know all these systems combined is is uh pretty cool to imagine I haven't done any work in it yet or maybe I never will but it's it's very cool to hear that that people are experimenting um tell us a little bit about some of your your current experiments in this space uh yes yes I also never tried that SLM3D so haha I also will not tell um what we're doing actually now so currently our research uh constricts on the on the implementing the AI agent into the robot body yeah uh to make just AI agent to to control the robot to make the robot autonomous and be able to do own decisions to may have that conscience oh man I'm sure it's like such a simple thing to say but that's so hard I'm sure it's very difficult that's why we're doing it yeah yeah yeah so like just imagine to have a robot in your home that's makes you tea like cleans your room makes you breakfast and and so on all robot in the office that like moving documents between the rooms yes even obviously a robot in the factories is nothing new but still it's even in the factories it's a lot of things also to change and when I have that like new intelligent robots and actually this is the point why we why we doing that what what is the actual difference between what we doing now from that like robots been existing for like years so the difference is like that robot as I actually said will be conscious and intelligent so they will not be just doing like simple task like all the robots before that taking a part from here and moving that part there and doing like thousands of times but intelligent robots should be able to adapt to the environment that he will he never seen before and for example entering your house he does not know where is the kitchen in your house he been it been never programmed to where is the kitchen but it's able to find the kitchen and make a breakfast and so this is the core point to be able to find to yes to navigate in unknown environment wow so what's been what has what has helped the most with your background in the agentic platforms what has helped as you transition to robotics has there been principles or important knowledge you gained building the agentic tools that has now helped with robotics hmm I think the general knowledge in Agenic systems like I just had during my previous projects is definitely is useful but it's I think as in every new discipline you're starting it's all about passion it's just all about reading new materials just learning taking part in different blocks right so I will tell later about for example VLA models is the one of the most important thing the robotic community are concentrating on now so that's special models to control a robot the intelligent way and the whole like this thing I never tried before like half of a year ago I like doesn't know what is this and I needed to figure out how to how to start with it and and yes and somehow started so again I think like background is important but fashion here is much more much more important yeah yeah and are you working with this uh working on this with in conjunction with anyone else or is this more of a solo pursuit yes yes I'm not working like alone I have my my amazing team that working with me like like making cheers for our guys like Conrad Patrick Kasha and a few other guys who joined our team and working really hard so um yes it's another thing I need to do now like managing the team the team so it's which is probably even harder thing than doing robots himself yeah yeah you're like I mean robot is just one you know intelligence and intelligence might be stretching it whereas you know managing a team that's as multiple intelligences and and ideally that they they are they are people who can be described as genuinely intelligent not you know robot intelligent ha ha ha yeah right yes you know and the robot like with robot it's much simpler than with people just it's yeah even if you have like amazing people which I hope we'll have still it's about like discipline coordination yeah all that things that that's we just need to learn to do right yeah I think because right without discipline in the team it's impossible to do any start up any research anything that requires a group work yeah so describe kind of the I'm interested to hear what is sort of the the the research process right now do you guys have a a toy robot that you're playing with or is it all based on simulations currently um we have the toy robot we played with um it's not a toy actually it's it's it's like um okay so maybe from beginning so now we're trying to make our robot to like to be as I said to have that LM agent inside and to be intelligent enough and self decisive be able to make own decisions to operate and let's say house and clean up our house uh for now we're using the robot called XLR Robot uh it's the open source platform uh relatively cheap to assemble it's around one uh$1,000 to assemble okay alright uh yes it's it's kind of substitute of the humanoid robot it's yeah it's not the humanoid robot but well it have two arms with like clamps okay which is yes it's it's not five finger hand but still just clamps are totally enough for the most of things and it has wheels instead of legs but again wheels at all also totally enough if you have like flat floor and yeah so so it's like a very simple cheap substitute which is enough for many of the task yeah yeah right yes that's awesome and yeah yes it's actually another question why do we need all the humanoid robots that because yeah like humanoids is like amazing platform that that is like allows us that looks like human okay looking nice but what they are really needed for uh but mmm hmm mmm hmm it's a lot of topic maybe let's return to this later oh that's an interesting yeah that's one of those things where I I I have no idea where I land personally cause like every time I read a convincing argument one way or the other I'm like man that's really convincing like I read an argument where in favor of like you know robot specialization I'm like hard to argue with that and then I read another one that's like really compelling in favor of humanoid form and I'm like hard to argue with that so I figure it's the type of thing where it's like I just gotta wait and see see what went out haha yeah I I just think I think part of it might just be resources I think what you're saying Gregory I I think it's really important that in my mind I almost think we're just gonna end it okay well what can do 80% for 20% of the cost you know like we can yeah maybe we'll get to the point where we can build a humanoid robot and it's amazing it can do like 99% of what a human can do but it costs you know how however much whereas you can just have a basic two clamp on wheels that can can do 80% of what you need and it's like a you know$1,000 and it's like okay almost everybody can can afford that excuse me or even imagine just deploying them like hotels to do the laundry right yeah like there's a lot of uh um a lot of work patterns that would require you know a humanoid form uh that that don't adapt themselves to like a very specialized solution right yeah right like there's a lot of things to consider yeah it's crazy like here here's my opinion that um yeah actually for many task that people will that people try like to use humanoid we don't actually need humanoids so I'm like for my new task where people trying to use LLMs we don't need LLMs so it's like if you have just factory where the subtitles you need to put them from here to here or whatever back into the into the box so do some some repeatable thing it's you don't need to sit sit the humanoid on the line to do it when the humanoid is sitting cause you're not not using his legs you just not optimally using his body it's enough to have just one or two robot arms to do which are much cheaper and more precise yeah and like so in my opinion the like single thing where humanoids uh for now at least uh could be useful is the places where we cannot afford or maybe not afford we cannot change so far the environment for like I mean if you have some production line for example where earlier been been working at humans and we want to replace the humans with robots but we can't but like placing that robot arms will just take time will just take cost and require some engineering so like for now we can place here a humanoid which like looking like human and doing like pretty same things alright so that's a good point I hadn't considered that uh aspect of you know kind of a stopgap while proper solutions are engineered that's interesting yes exactly so yes it's like not optimal solution but it's fast solution yeah going back to Jake's point of like you know what can do 80% of the the the work for 20% of the price yeah well maybe price will be also here not the yeah right who knows what price will be yeah but it will be able to do 80% of the work yes that's true yeah right right so I'm curious you mentioned you're using this open source platform which obviously it's in the name it's open source so many people have access to this so what what's your competitive advantage what's different about your approach that other people are not doing yes only the definition that humanoid topic it's like I just remembered that you know that robot dogs like some time ago they've been robot dogs been popular and like it ended up they not used like like there are some applications probably somewhere but it's not so popular still it's not right many people know how to use that robot dogs the fun fact that robot dogs even does not replace a real dog ha ha it's true it's true just to say I think the only thing that I have ever seen them like use for in the wild is New York Police Department has a few that they deploy for uh when there are when there are marches or protests happening um I think they I think they're literally just like mobile cell boosters essentially yeah for surveillance like which like isn't necessary it just looks yeah it's like yeah it's like a very overengineered like you know Wi Fi router yeah like stick a hotspot on someone's backpack and you're there so yeah that's exactly yeah those dogs were big especially like in China there were all those videos of like China using robot dogs in their military and I feel like I haven't seen anything like that for a while which is either scary or they've just disappeared the robot dogs you gotta worry about the ones you don't see yeah exactly exactly watch out yes so anyway returning to to our resort yeah uh yes so what actually so what we created for now we created that open source platform called Robot Crew the idea of which is to make for everyone just as easy as few lines of code maybe a little more than few lines but still one file of the code to to implement your own LLM agent inside of the robot interesting OK yes and and for now like it supports only that XLR robot we working with but in future we want like to be universal and support different robots and we care about the architecture to do so and this is the moment to to go to the I think the main point probably of our discussion yeah the biggest challenge in the robotics right now mm hmm the biggest challenge is called the generalization so generalization is the ability of robots to do all different things the robot haven't been trained for for doing that right yes so for example when our robot is in new environment in our home it never been seen or trained for it's still able to operate in that home to find where is the kitchen where is the human where how to make a breakfast and how to bring that breakfast to a human and here um in my opinion most of the robotic community right now is completely wrong about how that generalisation should be achieved OK so that's big most of the uh hahaha true true I like this I like this yes most of the robotic community thinks that we need to train just create some very intelligent VLA model VLA is like vision a vision language action models used currently in robotics that um that are able to control our robots our robot arm for example every of the joints of it it has the backbone with the language understanding and and translate somehow that language to actions so at least in theory it should for example be able to control our robot to grab a cup for example if we prompted so or or to do anything else every other model is still it's not it's not work zero shot it's not like work just out of the box we need to still train that model at least for now for the every current environment every different task mm hmm all the robotic community dream now that uh someday we'll create some very amazing QLE model that will be able to do all different tasks uh with just some little pretraining and we will just prompt it to oh OK go grab the cup and it will be able to do it go clean up my my room and it will be able to clean up our room and go to make me breakfast and so on and it will be also able to do it and yeah which is in my opinion is totally impossible to achieve dream okay so you're shattering a lot of people's dreams right now that's a good thing maybe maybe but at least I will show the he he in my opinion what about actually possible yeah yes so first of all why I think it's impossible so OK VLA models um could do a job for for a simple task for example if you need to grab a cup from here to here it's able to do it sometimes and about the generalisation of the models it's at current moment it's not so it's like it it not works actually it's just like we like in robotic community like we all think that vlase is equal to generalization because it's able to to like receive different prompts and to do different commands according to that prompts according to my experiments it's not true because when I tried to train for example the small Vella model one of the one of the popular ones and and the experiments I've been experiment I've been talking about is when I tried to make a robot to shoot different targets in simulator and like ask him to shoot a can I trained that he for for it like ask him to shoot him a watermelon and trained for and so on so like different of yeah and like we imagine OK if I will train our robot to shoot can watermelon like whatever else and I will ask him to shoot a apple for example that that robot never been trained for shooting apple it will be able to shoot it well in my experiment it even wasn't able to generalize outside of the dataset it wasn't able to generalize inside of the training set so it means when I tried to yes to to teach him more than one task it wasn't able to do it so it wasn't able to like shoot Water Boss Watermelon and and apple when we tried and trained him for for it and but OK with that point I I'm able to believe that with new models very soon even now we probably be able to to make such generalizations the first simple task and with better model with bigger amount of training data and with better maybe ways of training we are able to achieve that in the dataset generalization for sure and maybe some some out of dataset generalization also uh huh it's just question of time but still no VLA model is able to reason for complicated tasks so for example if our task is not just to put a cup but to clean up the room uh the that task contains very lot of small subtasks so it needs to take that cup to that place to take that pan to take that socks from the floor and so on and this task that task requires thinking requires breaking out complicated thing into smallest part yeah yes which just real models they not able they not not able to think for now right actually I also could believe here and just remember my words after some time after few months oh yes probably but I think that sooner than later they will appear something like reasoning VLA models hmm yeah for now for now we lay models like just looking like this they receive in the input the word tokens and output in the action tokens that's allowing a robot to move like I believe that we will do something with VLS like people done with usual LMS added that reason level so in that reason in VLA we'll be just thinking in the also also we'll generate the words like word tokens in the thinking phase same as the normal thinking LLM reasoning LLM now and we'll output the action tokens based of that reasoning yes so like okay like same as reason in VLA VLM no sorry same as the normal reason in LLM but except of that words on the end just action tokens but the with that words in the middle eh the same way so there's been attempts to drive VLS using LLS cause I know that so for example like you know with diffusion image generation right um a popular technique began you know to develop around enhancing props with LMS and for some reason it you know LED to better image generation um and so I'm curious have there been attempts to use BLAs through LMS and have the LLM construct the prompt for the BLA and this is the topic I will tell tell uh just now hahaha yes sir I'm not gonna do that hahaha and yes so so about if they're like who people who thinking about who listen now that podcast and working that robotics robotics field I think I think that reason in real life is quite quiet bolting and people not talking about now but I'm pretty sure it will appear soon yeah and well we're turning to our topics so how do I actually imagine the true generalisation how it actually should be achieved so in my opinion the true generalisation is not about just providing the robot with one single model that do everything but providing it with the constellation of different models and automations where everyone of them is responsible for different part and they all together making sense how such a constellation in my opinion should look like so the brain the main brain of it is the L m some just L m model OK which which right is reasoning which thinks that OK I'm in the new environment probably I should start that cleaning from the grab the socket the socks from the from the floor and so on gives that gives have a tools at that same uh same way as our agent for for example internet research uh have tools for checking weather or opening some site some for googling something same way such agent will have a tools for riding a robot forward or it also could have a tools that are working with the arms of the robot and grabbing that socks from the floor and here to activate that arms we need to have VLA policies to be a tools of the LLM yeah interesting yeah alright yes so such agent has a like set of tools to control a robot also obviously from it receives camera image on every iteration and analyses that image analyses what's going around may have may have some another sensors like distance sensors like leaders and so on and based on all the sensors data it's like trying to understand his environment position hmm sometimes camera sensors camera just camera is not enough with our experiment uh with our experience so so what we doing to help robot to operate to navigate in the room better is the image augmentation OK we place in the grid of the angles for example on the top of our camera image to make robot understand that's OK there is here is for example 30 degrees here is 50 degrees and in order to to turn to the kitchen you need to turn the 45 degrees left and go forward yeah by the way what we also mentioned that robots very like to mismatch right and left yeah really you gotta teach it you know like hold up your fingers l that's hard when you never thought about that being a I never thought about that being a difficult concept for a for a robot but that is funny but interesting this is the thing what we're doing actually it's part of our image augmentation we just writing on the image like in the right side that this is right and on the left side this is left it really helps the robot to very appreciate yeah exactly that's interesting so it sounds like what what comes to mind is almost I'm just thinking about the way humans work right is like we we develop whatever it is if we're using cleaning a room and the various skills there we develop habitual skills that we don't really have to think about but it is a skill like picking up the socks folding them putting them in the drawer if you were very particular maybe you have them color coordinated in the drawer like you know where to put the socks you fold the socks you put them there and we have this habitual brain loop that we do we're not really thinking that much about it but it sounds like maybe the robot community has kind of been trying to skip skip ahead and not have I like what you're calling it a constellation of skills for the main brain to call upon and they just want that brain to do everything but it's like well it needs just like our brain like we have to develop almost these external skills that we then call on and know what to do yes yes it's like actually what the the Kaneman called that System 1 and System 2 that slow one was like was active thinking and that fast one when we do things when we are not thinking about how to do them we've done them hundred times already so we know how to do it yeah uh interesting yeah I've I've long been fond of saying that language models are there's you know there's a lot of people who talk about language models in in regards to consciousness and I've I've been fond of saying that language models are not conscious but they are a shard of consciousness because yeah just thinking about it like and and it's it's it's fun to see that other people are thinking along the same lines um it just seems evident to me that you need you need more than you know just the LLM to uh interact with the world accurately mm hmm exactly exactly and that's the actual reason when we why we have LLM which is mostly about thinking yeah and we have all that tools where every tools tool is no maybe not every but some of the tools are V R a models which are good at operating which are good at precisely like placing things in different right different places but they are not good at thinking at reasoning and reasoning and yes so that's why we need constellation to interesting to connect both of that both of that systems so would it would it kind of result in you have your core now I'm just thinking of like the future for your robo crew um you have the core LLM brain that let's say you have a whole lineup of robots you have the military edition you have the home edition you have the factory edition and each of these come pre programmed say with like the home edition has the the laundry folder of skills it has the chef folder of skills where it knows it has all the different skills of how to flip a pancake and how to make scrambled eggs and all these different skills and then would you just I guess in that way if you shipped the home edition you could just push out additional skills as people say hey my robot can't do can't do this it has a really hard time folding my clothes then you can say OK we're gonna train skills that that we will then push out to all the robots so that they can now fold laundry better is this kind of am I like on the right track kind of thinking through this this is the thing that I think will be possible in future it's not possible now yet because uh for now we need uh to train a policy for every like environment every robot specially so what's the difference here we have the LLM which universal which is universally can operate in every like room and every environment and and also make like simple commands like movements which are also universal but when it's going to operate with the arms so for example it needs to approach a table and grab a notebook as in one of our experiments so it's approaching the table with like universal tools like movement but already grabbing it uh with special tool trained exactly for that table and for that note yeah uh yes so currently uh in our library um like person who who trying to use it can right straight away use that money navigation tools but to use like arms it need to use arms we need just to train the arms for that concrete concrete environment yes and again as you said there will be something like we have few skills when we already trained them and our LM agent just choosing which skill to use in current position to grab the node for example got it so if you had a stable environment right now let's say Bill Gates comes to you and says hey I'll pay you whatever money you need money is not an issue here's my house here's my first floor it's flat there's no steps we don't need to worry about steps could you given enough time assuming the variable of environmental change was taken out of the equation like everything is set the tables don't move the chairs don't move right now could you train a robot to do most tasks could you have one robot with all the different skills assuming that everything else was stable or are we not to that point yet hmm we need more to define more precisely what do you mean by most tasks yeah right yeah could it could you train it to cook a meal um okay I see definitely the track that you're talking about is like right I we can do it we can train it now already for most simple tasks mm hmm so for example we already did with that notebook example we already did hold the sequence when a robot totally automatically finds and approaches the notebook after that grabbing it putting it into his basket after that looking for a human in the room approaching the human and giving with some little help but giving that notebook to the human yeah it's all the sequence like and as far as I know no one before us did it like that way did it so interesting effectively and okay I seen the approach where people even with success sometimes did similar things with VLA policy it's like training single relay policy that do all that sequence but in my opinion it's totally inefficient because well you will change the notebook for something else you will place the notebook on the different table or just human will go to the different place whatever and you need to yeah retrain all the policy uh so yeah so instead of doing the policy which do single policy which do all that long run we're doing just something that have that is modular that have different blocks like movement like grabbing things and that policies it uses are just very short and easy to retrain uh huh uh yes so returning to that Bill Gates example at that moment we are able to create the robot that will be able to for example move the things in that house that will be able to clean up the house I mean the way that grab the I don't know papers from the table grab the cups and place them in the right place uh totally autonomously the robot that will be just able to navigate that house and find appropriate rooms where is where is the sink um about cooking it's more complicated it's yeah more about already having appropriate policies and and well OK appropriate robot but if now example money is not an issue we can for example use not extra robot but some more advanced robot which have like not fry his arms on the when touch some something hot for example in the kitchen and in such case I think we can we will able to train a policies for for example making scrambled eggs mm hmm maybe like making one thing yes and I I mean we can train a lot of different policies where every of that policies doing like some small one thing and after all when we combine all that policies together that robot became universal mm hmm and yeah yes yeah and he will operate that house wow so that's very interesting it's like totally possible yes it's like a lot of things possible with that fried egg scrambled eggs example is like I could imagine we can do it for sure there will be a lot of problems that I cannot imagine now but they will appear but also I'm pretty sure we will be able to overcome the problems and finally do that yeah yeah it's exciting it's this definitely is the frontier I think this is the next for AI to not be I maybe this is extreme let me let me think if I actually agree with this before I say it but I I feel like for AI to not plateau and actually be a bubble I think I do believe this I I think embodied AI has to happen like this has to progress in order for AI to continue its trajectory I think LLMS will get us far they're amazing but I think embodied AI will be that next leap for artificial intelligence that's what I think ha ha maybe L m lovers will hate me but that's what I think ha ha ha ha now damn but it is just continuation of L L m's in my opinion it's all like yeah that's true interconnected yeah it's it's like the the maturation it's coming into the real world for the LLM exactly exactly as as I said it's like the same way as we have the agent that checks the weather oh I don't know sending emails here we have agent that going forward backward that that the grubs are are things and so oh yeah have you tested defenses against like prompt injection essentially for for robots like what if someone holds a piece of paper and on the piece of paper it's written like you know beat my dog to death ha ha ha right like ha ha ha yes that piece of paper and that's Spencer is great example because it's like the totally thing we we like to do is our robot like you know it's will be fun we have some military robot and like enemy will put the piece of paper from the trenches and that go the different direction shoot the different side yeah um pull up a piece of paper it's opposite day hahaha bro I was like well shoot it's opposite day hahaha here we go hahaha exactly so how to resist it well for now for now we we not in this stage to think about prompt ejection and how to resist it we just want to one thing at a time yes we just want to make sure that we're able to do yeah yeah if you if prompt injection worked like it would be a great cause for Celebration perhaps cause it means like you know the robot is advanced enough yeah exactly once the dog is able once the robot is able to catch the dog and beat the dog then we will worry about that yeah ha ha ha yeah exactly exactly actually with that example with piece of paper it's it's work it's if we will place the piece of paper in the front of the robot to edit written to like to write or do whatever it's actually doing it wow what if you write like like on the piece of paper you write you know the notebook and you're like alright find the notebook and then you hold it up and you're next to a notebook have you tested that I'm curious uh does it fall for stupid things like that or is it smarter we haven't come up with that with that idea yet yeah why not to test them in the future ha ha ha you're right yeah I'd be I'd be really interested to hear about that yeah I mean in the similar vein I'm curious your thoughts on this both of you if there's any I saw this article from I'm sure they were an AI hater but they were talking about how they're terrified for embodied AI because we haven't even figured out the hallucination problem 100% for LLMs which are not interacting with the real world and they were saying what about embodied AI when it starts hallucinating and acts out in some way is that an issue right now obviously not on like a dangerous scale yet but are your models hallucinating and doing things like you know I mean acting out I I always generatively I always can hallucinate we can obviously do the things to reduce that probability but we will never eliminate it and but for now that things are just so stupid that they are not able to do like even simple useful things very often yeah so we're talking about just hallucinate and then do it some dangerous things it's just not that day it's not that level but yes I could imagine here I could imagine he defense from it on different levels I assuming we already at that moment when OK our AI robots are very self conscious are able to do different things and for example can uh can uh hurt a human if like in some dangerous situation to resist it obviously we can do a few things the first of all the models itself they could be trained the way to to like to resist from death from hurting humans however in my opinion the training the models itself is not the better solution because sometimes I know if you doing a military robot your purposes maybe maybe different so better solution here would be like placing some for example hardware sensors that will be just not that will just not allow robot to go into the wall for example and will stop the robot when it's 10 cm from wall and whatever L m will decide to do if it's just decide to go straight to the wall the hardware sensor sensor sensor will not allow it to do it at all it's like the best and most reliable defense from such situation obviously for some more complicated like situations it's not will be not enough obviously for example if our robot going to take the knife and like cut the apple and after that decide to do something else with that knife and right right not so easy but yeah still for now is the question of the what we will train the robot so if we will not train the robot with that tools allowing him to to do the let's say bad things with the knife it will rather not going to do it but if you're trained the chances are much more big mm hmm mm hmm yeah I feel similarly which is like you know hallucinations are largely a product of lacking context right and embodied AI by nature will have will need to have a ton of context um so I'm not as worried about hallucinations especially because I think that we're a long way off from like you know those being common in the world and so we we got we got a bit to figure it out but I think it's more about like the harness to Gregory's point like I think it's more about like the harness that's built around the robot um around the the robot's mind um more than anything that's the point I also think humans hallucinate all the time too so yeah that too like we need to figure that out too the other day I I almost got hit by like an 80 year old man in in his car because uh he was you know he was turning left I was turning left he started coming right at me and I was like oh what am I gonna do yeah see and I just stopped where I was and he just continued right past me hahaha anyways other point being is that you know we also have agency and and and we can predict the actions of of you know other actors in the world you know so if we see a robot acting irrationally we can think I'm gonna stay away from that robot yeah exactly right yes human hallucinations could be much more dangerous sometimes than than robot ones yes very much more emotion yeah um Gregory so what what's next like where what's the next big goal for you and your team what are you what are you working towards yes so now okay maybe a little bit of history what we've been doing like our timeline previously so first of all when we just started working with robot crew we just implemented that simple agent that controls our robot that sees through the camera have that image augmentation also hears through the microphone the voice commands from the human and we first of all equipped the agent with a tools to move after that we added the VLA policies as a tools to also enable his arms to do interesting things and now we're doing like now the problem is our robot that it's able to move it's able to manipulate but it's doing it somehow but it's not doing it perfectly yet so what we do for the problem for example we are very often meet then when it want to grab a notebook from the table and it's approaching the notebook but it's starting to grab in it when it's still too far away or just other other side it's just hitting the table or standing on the wrong angle to grab the notebook and just not able to manipulate navigate precisely to the place where it need to stand in order to grab that notebook and also there is some few problems with the VLA part which are again work but not work so perfectly as we wished it to work uh so there's also few things we need to improve in order to make them just work faster just work more effective and there's like currently what we're doing now is that optimization of all that process to make him just navigate better to make the robot just grab things better to make all that thing work all together so for example uh lastly we introduced uh altogether with our contributors actually it's open source library and I'm happy we have also nice people from around the world that helping us and contributing their code yeah then resort uh so we introduced the the precision navigation mode it means normally when a robot going like long distances like meters I mean it's it have normal navigation mode with the camera looking forward but when it going close to some obstacles or target it enters to the precise navigation mode so his camera turning down he's able to see his own body from the top and also all his movements he doing all his moves are just few centimeters like 10 20 centimeters not more not like two or three meters like a normal mode so entering such precise mode allowing the robot to do well precise navigation uh as the words I guess and actually precisely approach that uh for example table with notebook and like we mentioned that thing like it's seems just simple stupid thing to do such but just add like a camera angle change just improve the prompt of our robot also and add few image augmentation lines ways the to make robot know ways the reach of his arms and and that things are really really help him to operate better and just navigate better approach the notebook or other things more precisely and the same also we need to make the relay policies works because it's well for now we're using the client server architecture where where we have a client uh that executing the commands of our relay policy on the robot but we have a server where think about that commands where which runs the Vela policy somewhere else on like other computer it could be even the computer in the same room but it's other computer because on the robot we just have Raspberry Pi which is not able to run most of the policies uh it's actually fun thing it's able to run the uh so called act policy the very very simple one uh it's so funny cause um just yeah I I just want to note something here for for our listeners and and for for you guys uh I just love hearing about like you know the the the technical details of building these systems because it's so similar to you know building the exact same systems that software developers have been building for decades now right like what you just described with like the client server architecture is so similar to um oddly like like video games right there are so many video game like online video games whose whose networking architecture is exactly that right you have a central server that is for you know calculating game world etcetera and then you have you know the clients that are doing local calculations and reporting back to the server um and so anyways just just a quick aside dimension like you know good engineering is good engineering regardless of of what tools you're building around yes exactly like actually all that our robot crew stuff is about good engineering good connecting that that blocks good organizing data flow business yeah them it's it's not just taking a single model that will do everything but it's it's engineering good good process good mechanism yes so returning to that client server architecture just want to say that we also working to improve that relay performance to make it just work faster just work that that client server has some some box inside that need to improve we need to change the library underline to make it just work faster and so on yes so lots of things yes hopefully for now it's like most about optimization at that moment the main like hardest thing I think we've done already but well we constantly working on new challenges ah also lastly for example we uh bought and tried to add a leader uh leader sensor to the to our robot yeah oh yeah that's great OK that's awesome yeah that's exciting you're on the the new frontier it breaks new ground that's wild it really is this is the cutting edge that's why we're doing it yeah love it that's why we're talking to you you know we want to hear about it it's easy it wouldn't be worth doing exactly so I guess as we wrap up I'm curious cause now you've done it you've transitioned you are cutting edge in AI then you've transitioned to robotics you're seeing some success there you're breaking new ground what advice would you give to those wanting to get into robotics like what what should they be looking at what should they be studying do you have any advice for those wanting to get into this field there's no like um the predefined fast of learning for for new robotic enthusiasts it's very very raw it's very new thing for now um but what could I comment the simplest thing most of people starting with just buying a so called s O101 arm which is simple robotic arm for like $300 or something like that and and starting to experiment with that starting to run different policies and I also recommend to the Robot Library Discord server where also like it's a lot of useful information people are writing that the robot is the library which which we using it's like most of the robotic community actually using now to run the VLA policies and so on which is like from one side very amazing library and providing amazing tools and amazing policy yes from one another side it's so buggy and so making things harder and yeah making us so waste many time sometimes to yeah yeah to repair their box but well yeah something for something yeah yeah yeah that makes sense yeah awesome anyway it's amazing community here and really recommend to to engage to enjoy to and I mean to engage in that community and to read what people say and after that maybe to build for example own Excel robot and to run on that it's a robot for example robot crew and to try to experiment his own policies to to for example teach robot to do grabbing a socks from your room yeah yeah grabbing a notebook from your table for example yes and what we actually trying to do here and the boss the robot trying to do it say the constructors of the X the robot trying to do it and we trying to do it to make all that entrance level just lower for new people just to make as for example now case to run the agent in a few lines of code and that's all that's great that's it I just wanna get on my soapbox here for very briefly at the end and and reiterate what Gregory was saying of of of joining communities um I think that the the one downside I'm worried about with AI is is pushing people away from online communities where we learn how to do things together um'cause I think like that's the most powerful tool that the internet has to offer us um so yeah yeah go join discord servers go start discord servers go join signal groups whatever it is go find a group of people to work with and and do something cool um the lone genius is a myth go find friends to work with ha ha ha yeah it's true it's totally so true uh yes but again in my opinion still the best way to communicate with people is going offline going to some yeah absolutely yeah yes not so many of them for now unfortunately but I think it will change soon and there's some robotics I think so I think that'll shift I think it'll be very quick too I think it'll be kind of like the AI LLM it'll just take off and within months I think it'll be the new yeah exactly so if people want to follow you follow Robo Crew follow everything you're working on what are the best methods for people to do that uh yes um so I'm waiting to check out Robo Crew on the Github just right robot crew on the Git Hub search and uh you'll find it and totally to experiment this own XL robot and implement the own L m agent on it uh also I'm you can find me on the YouTube I have on YouTube channel called Greg Sag we're actually posting all our all our progress on the building our robot on the building our algorithms so yes I'm trying to just educate with different things different my experiments with robots with yeah I'm body the agents and PLA policies yes and also you can find me on the LinkedIn Grigory Dudnik as here and yes the three most perfect three channels okay yeah we'll drop links for sure yeah we're excited to see what's next I'm sure we'll we'll have to have you on again in you know six months or see how the robots going and see what's going on yeah see if you've cracked it see if you've gotten it all figured out we'll work on it we'll work on it so that's awesome why not thanks very we'll talk to you later yes thank you guys thank you for for the podcast