AI Rebels

RAG, Agents, and the Future of AI Memory with Roie from Pinecone

Jacob and Spencer Season 4 Episode 18

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 1:03:10

Most RAG implementations are fundamentally broken; and the company that coined "vector search" just told us why. In this episode, Roie from Pinecone breaks down the "Franken answer" problem plaguing AI systems, why naive retrieval falls apart at scale, and what most teams are getting wrong about evaluation. He reveals how the AutoGPT explosion nearly took down Pinecone's infrastructure overnight — and the radical architecture shift it forced them to build. We dig into why LLMs can't be trusted without grounding, what AI memory will actually look like in the age of agents and robots, and where the line between useful hallucination and dangerous fiction really sits. If you're building anything with RAG, vectors, or agents, this conversation will change how you think about it.

https://www.pinecone.io/

welcome everyone to another episode of the AI Rebels Podcast as always I am your co host Spencer and I'm your other co host Jacob and today's guest we're very excited to have on this is Roie from Pinecone uh for those most of you have probably heard of Pinecone but if you have not Pinecone powers the retrieval and memory layer behind really thousands of AI systems which means Roie really sees what actually works in production and what kind of what what breaks at scale um we're gonna be diving into all kinds of things rag agents memory who knows who knows what's in store but we are very excited to have you on Roie thanks for thanks for being here yeah thanks for having me so I'm curious Roie your story how did you get how did you get here in this role at Pinecone yeah so it's a funny story I actually um joined Pinecone uh almost three years ago left two years after um for a short stint at a company called Galileo and then came back um and before that um I worked um in various startups and most notably like in the AI space I worked for a company called Mana which did computational knowledge graphs um yeah which like it was a very fascinating kind of technology um was like way before the LLM era started um and so I had like a lot of experience in what you would call traditional AI um and I guess from there you know exploring this like new this new field of um you know semantic AI and and you know any kind of latent space based AI um became like the next big thing and I was obviously attracted attracted to that um as well um and yeah so I got I got um a job at Pinecone um started off as a a Devrel um and then came back and now I head the Devrel team so Boomerang he left and yeah that Boomerang came back that's a good sign for a company when people you know leave exactly and want to come back so yeah that's great so um okay so let's I mean let's just dig into to Pinecone can you kind of give us the the big picture what's the big idea that people should understand yeah so Pinecone is where you go to build knowledgeable um AI applications um and you know people think about you know applications AI applications these days in terms of you know like LLM that you would use most of the time um but obviously like this LLM's are a a huge part a huge component of of these applications but they're not the only component um right our ability to sort of work with the idea of retrieval is one very interesting uh use case for Pinecone but there are others right like things like um you know recommenders anomaly detections and others um are kind of like the basis for a lot of the applications that you see out there um that are really looking at signals um in a way that's just beyond you know just generative AI um but like the it there there's no doubt that like really the the lion's share um of use cases at at Pinecone kind of revolves around semantic search and retrieval um and that's uh that's what we offer most of our customers um and you know the the tricky part um I would say is to do this well at scale right so yeah if you have you know small enough kind of workloads um you won't have a problem you know using you know any of the the in memory solutions that you can find on the market um you know but when when you start thinking about you know tens of millions hundreds of millions and billions of vectors which our bigger customers actually do um then it really becomes a completely different engineering challenge that calls for you know very very deep uh technological solutions so so you mentioned uh that that Pinecone is there for you as you scale have you found that customers cause we actually we had PG uh the oh gosh what is had a guy talking about PG Vector on forever yeah right yeah so I'd be curious um do you find that your customers are people who have run into scale issues with PG Vector mostly or is it uh you know they reach straight for you guys and and skip PG vector yeah that there you know that there's there's been cases where again performance is is like a huge issue um again like in terms of like the speed of writes the speed of reads uh the concurrency and and again PG Vector is a good solution for a very particular kind of workload um and you know what we find is again like with our customers when they when they start hitting those limits right like it's the question of like yeah you know how do I migrate how do I get to like that that you know P99 like where the performance is like where I'm looking it to looking for it to be um and I'm sorry to say but like a lot like that that that moment is very painful for a lot of people it's like yeah okay like yeah and you kind of realize right like you know it's something that I I I tell a lot of people right like you could theoretically use Postgres as the store for anything right right you could store graphs in it right you could store documents in it you could store whatever you want like you could use it as a blob store theoretically right should you do that probably not if you're trying to build a production system that will scale ad infinitum and again like it's it's it's a question of like what do you want to maintain yourself versus what you don't want to maintain yourself and so that's where that's the sweet the sweet spot where Pinecone kind of really shines right like it's the point where you know team like bigger teams like have like a lot of other concerns that they have to deal with they don't want to think about how to scale their their vector the vector store they don't want to think about how to you know maintain you know peak performance for both the reads and the writes um and to like fine tune exactly how their Kubernetes clusters come you know deployed and you know doing all of those like little modifications to get it just right famously fantastic and that's right exactly and that's that's that's the the the the the the the pleasure of using you know uh a SaaS vector database right that really knows how to scale and is battle tested right like and that's again like another another thing that we right ourselves in is like we're not just you know saying that um this thing works our customers speak for us right and like they they attest to you know the the durability and the performance of the of the platform as a whole and you guys were one of the first people in the space to my memory um it was it was you guys and I think there was one one other that I remember from the early days I'm blanking on their name but yeah no so Pinecone was definitely the first there were other other people came into the space after us I think that Pinecone was the the the company that you know uh coined the term vector search um in the first place wild um yeah which is which is it is wild right like it again like the the other there there's another side to this and this is this is kind of like the the the the reality that we're contending with is it vector search has become commoditized right right like there are 15 different vendors that are right kind of working in this in this space right and like the question is like how do you stay like on top of your game and current and making sure that you're answering the needs of of the of the community in general and the industry in general and for us again the answer is first of all keeping our our solutions as uh you know scalable as ever so again there are more and more um uh companies that are looking for the hundred million and billion scale solutions which we can again continue supporting um we've just announced DRN which is our a a a solution for specifically a dedicated read nodes right so like it's it's basically the ability to have um notes that are specified for high throughput reads oh interesting and again in those situations where like again like the the the it's very hard to fine tune right like your your infrastructure such that you know it knows knows how to optimize properly for those kind of scenarios if you don't do that what you end up with is like instances that are just up and just waiting for you to just read from right yeah um and that's obviously not cost effective and that's again like this part of what our our our you know bigger customers kind of are expecting to see um but there are other things as well right so like for example the the the other product that is really picking up right now is called Pinecone Assistant and you can think of it as essentially a rag in a box right so instead of thinking of rag as like this pipeline that goes like step by step you have to think about you know all the different things like chunking and embedding and reranking and you know doing all those things kind of manually it basically allows gives you like this very very simple to use interface where you upload files and on the other side you can get an API that will give you those context context snippets that you then inject into your LLM of choice or you could actually use a chat endpoint that we can provide you as well with whatever whatever model you want um and really it it kind of like takes away all that complexity from the rag process um and that kind of like opens up a new a new way for people to think about you know rag in general instead again of thinking about about it as a pipeline you can now think about it as essentially a building block right yeah and now that kind of opens up the conversation to thinking about you know knowledge in a completely different way yeah yeah I love I I love a lot of what you just said one thing maybe big picture that I really admire is everything that you just touched on with Pinecone and how you obviously Pinecone was the first mover in this space the first mover but some people don't like this I really like this about this AI space and everything related to it is it's just so fast moving and it forces organizations to maintain this really lean startup mentality of OK everything's changed we have to iterate how do we stay relevant how do we stay valuable our core offering our initial core offering for Pinecone has kind of been commoditized at this point how do we now continue to iterate and get better and provide more value it forces everybody up which is stressful I actually have people follow on that yeah I have a follow on question that and this is the last thing I'll ask about PG Vector and then we'll move on I don't want to make you talk about your competitor the whole time but I'm I'm really curious um see if they're not quite a they're not quite a competitor I just want to like stress that they're not a competitor because they're operating in a in a different vertical I would say okay yeah yeah um what's the what's the proper word a peer anyways don't want to make you talk about someone who's you know whose company you're not a part of right sure sure but I'm really curious when when PG Vector came along was that something that you guys kind of anticipated happening or was that something that you guys had to react to yeah no we we knew that like you know people were going to try and bolt on a lot of you know the the the capability because it was so valuable right and again like it's it's only natural right so like what what's the role right of a vector database in general right it basically bridges the the the two worlds of unstructured data and structured data right yeah like you the the world of meaning with the world of structured data right and it's it's natural to say like hey I have a bunch of structured data I just want to reach out into the world of meaning and you know leverage the capabilities that lie there right so that like the people who want to interact with my structured data can do so in a natural way right so like I use natural language I I use meaning to convey like my intent right as opposed to like surface forms like I don't necessarily know the surface forms that I'm looking for but I know the meaning that I'm looking for right and so I want to interact with those systems in that way and so yeah like obviously like any anyone that has structured data is going to want to add that layer and vector is not the only one right like Mongodb has something for it Elastic Search has something elastic has something for it you know like Vector Data solutions basically anyone with a database right said hey why don't we just glom this on the problem though is that again the it's not it's not like a a a problem that you could just tack on right like again at scale like you have to think about like the fundamentals right like of how do you actually make an ANN algorithm work right really efficiently at scale right and and that doesn't really happen um overnight and it doesn't happen effectively right with bolt on solutions I like that yeah no I was I was actually gonna ask about Elastic Search um and so you you answered my question great um yeah cause cause I think that uh it's it's an easy question to ask of you know oh well you know there's Elastic Search there's there's all sorts of solutions um uh it's it's it's it's very interesting to me to consider vector search as a solution in of itself uh whereas you know all these others like you said like they're just bolting it on it's not really a full solution centered around a vector search directly it's like hey you know we have all these traditional heuristics and also if you want a vector score we got that too right and and this is again I just don't want to disparage anyone like it's not to say that like these things won't work right like they work for very particular exactly particular use cases and different scales right and like we don't we wanna we we basically our our our pitch to developers is like start building again like you can choose to start building you know your your vector database in memory right you can use a file for that you know there's no problem you can do that right do you want to then refactor your application right to then work with a scalable database that's a choice that you just basically have to make right um right I personally right would rather like kind of start with and I and I do this all the time right like I pick the best uh Sass offerings that there are right for the thing that I'm trying to do right um and so if it's like yeah like I need an LLM yeah like I could potentially use an open source model right like in a lot of people do right but I enjoy the fact that like I could just plug in either Openai or Anthropic and then enjoy like the advances that they would pour into their models and have the same signature and have everything kind of work out of the box and sort of not think about it anymore you know what I mean as my system scales right right no that that makes a lot of sense um I'm curious where you see obviously you've seen probably lots of successful and failed implementations of Pinecone uh I'm curious what are the some of the biggest stumbling blocks like where do people have issues when they're trying to either convert whatever they're they've previously been using to Pinecone or maybe even just larger than Pinecone to more of like a vector rag approach yeah so it's it's a little different for rag maybe I'll talk about rag specifically cause like the rest is kind of like more of like a you know your regular infrastructure kind of pains growing pains right um but for rag specifically like I feel like you know a lot of people still have this uh uh vibe mentality and I'm not gonna say anything bad about uh you know vibe I'm not talking about vibe coating but I'm talking about like hey I'm gonna develop something uh that includes an LLM inside right and I'm just gonna figure out you know whether or not it works well by testing it out with anecdotal kind of test cases right um as opposed to like really rigorously thinking about evaluation um and how you go about it um yeah I think that most cases that really hit a road like that hit roadblocks are those that didn't really think about you know like how are they going to know whether their system is operating well or not right yeah um right and again like in in the in the you know the deterministic world right of software engineering where like you had tests for everything um it's hard to you know have like the exact same thing in the non deterministic world of LLMs but that's where you have evaluations um there are many companies that do this um but also you know there are many open source tools that help with that we help with that um in a lot of cases to sort of again help you understand whether or not the system that you're building is actually doing the thing that you're that you want you're you're intending it to do um and and I think that like that that is like the the core issue that that then leads to a lot of different symptoms right so like people will come to us and say hey I don't really know how to do chunking or I don't really know how to do like my like I don't know if I'm using the correct embedding for this particular uh uh content that I'm trying to work with right or I don't know if my reranker is working properly etcetera but those all stem from the lack of kind of foresight and and and kind of um really like uh discipline around evaluation it's so interesting because I feel like we've been everyone's been talking about evaluations for a long time now but I feel like a very common thing that Jake and I have seen and and and and now it's repeated with you as well um I think that everyone is starting to think harder and harder about how do we actually measure you know success and failure um and and it's it's really interesting to watch uh yeah just watch everybody start thinking about which I think this whole it's kind of older news now but the whole like 95% of things fail right I think one of my issues with that is like well what what are you defining as failure what what is success and I think at scale that's been the issue right AI just came so fast so hot and so fast that we didn't define those metrics we didn't define companies just said we need AI in our board deck so get it in there right yeah and I I think it I think it there's there's truth to that I also think that you know like you know just the the the implementations of these technologies have mostly been naive right so like again most people that I talk to these days right that are building you know rag solutions right are essentially building naive rag right meaning that like they take some content they chunk it they put it in you know some factor database and then have like a a an a query that gets embedded you get results you stick them into the LLM voila rag right and that's okay right but it's not gonna scale right and there's many many reasons to that right like the the the the main reason being that you know like if you have enough content if you have uh if you're if you're if you're if you're your content is non homogeneous right then you're gonna end up with questions that are gonna retrieve Ahem content from different areas right of your of your knowledge base essentially and because they're gonna have some semantic similarity to your query right and you're gonna then fuse them right into what I call a Franken answer right and a lot of people see this happen right like you you you you get like two pieces of reasonably seem sounding kind of you know uh uh you know context snippets right again that have a lot of semantic similarity to the query that you had right but then the fusion makes no sense right because again like one is from one data source and then the other is from another another data source and so you kind of have to start thinking about like okay so when I have a billion documents right in my in my knowledge base are they just one huge uh data source or do I have to start thinking about segregating them in some way that is gonna help my system understand what's going on and how to how to help my user that's one that's one uh level of it the second is okay so like a user asks a question right like and so so imagine like I imagine I this is an example I I give a lot so like imagine like I have like this um electrical utility store I sell you know toasters and refrigerators and whatever like a milkshake whatever and a user comes to me and says my toaster is broken okay great how do I help this user right so like I can go and look at my you know knowledge base of uh uh you know manuals and uh you know bring some toaster manuals to them and say hey here's the toaster manual right um and I can look at like my warranties right and say here I found you a warranty for a toaster right here take it right but the reality is is that like I actually don't have enough information to help this person right so what would a normal human being do a normal human being would basically disambiguate right they would ask OK wait what kind of toaster do you have what's the model and make right when did you actually buy this toaster right did you buy it from our store did you buy it from our store exactly and when they when they when when you have this information now you could go like okay I don't have to search through all of my manuals that just maybe mention toasters I wanna search for like the manuals that are particular for this particular type yeah right and then search for maybe I ask the person like what problem did you run into you know what I mean what was it what was it broken okay like tell me what happened like did you smell smoke or like is it just not turning on or like what's the problem right and then I'll right fetch back right like the right manual that will tackle the particular problem that this user is asking about and if I know when they purchased the thing I'll be able to tell them hey like your toasters specifically has a warranty of 12 months great so look at the time and it seems like it's still under warranty so I can just exchange your toaster right so this is all to say the process of figuring out what reality is applicable right to any particular question that a user could come up with right is uh it's kind of nuanced and you know you need to like start thinking about like what is the nature of your knowledge base right like what is the nature of all of these documents that you're dealing with right and how you could leverage those to actually help the user not only find the right the right the the most relevant information but also like disambiguate the the intent that they're coming up with yeah yeah and you're saying that that step is what a lot of people are skipping this like organization this organizational piece yes I I think that a lot of people are are skipping it and again not because it's um I I think people are skipping it because it's hard right like it's difficult to start thinking about those things it's much easier to say you know give it to the LLM and the LLM will sort it out or give it to the uh to the vector database and the vector vector database will sort it out right that's what we saw with like you know you know that this claim of hey like uh you know LLMs now have like a context uh window of like a million tokens right why don't you just throw all of your documents at the context in the context window and you're done yeah right like the the the LLM will just figure it out well just turns out that if you are searching for multiple facts within your context window right most uh LLMs are gonna fail right and the loss in the middle problem is very very very real still yeah right so yeah need a single needle in a haystack is not a problem anymore right but multiple facts in a in a in a haystack right multiple needles in a haystack is very much a problem and so like you can't just rely on that let alone you know there's like the cost and the economics of like doing things that way um but the same applies to vector databases right like you can't just like throw everything onto the vector database right without thinking about the essentially the meta layer right like meta knowledge that exists on top of the the vector database and kind of helps your agents or your systems sort of navigate um what's in it yeah so could you we've been like hinting at it but I just want to get your explicit I guess response like can you what is the bridge why is it so significant what's the bridge between AI agents and a vector database and rag like why is this why has this become such a big a big deal vector and rag in relation to AI agents LLMs um so I I I think again conceptually right like the the bridges that like you just have to decide where where authority lies right like what do you trust that's what I said right like do you do you trust a thing because an AI said so or do you trust a thing because there's a trace that could tell you where a piece of uh of where so let's call it a claim right comes from mm hmm right so for example and that's like again like the the strength of of of vector databases here and where they sort of are complementary to AI to sorry to LLMs right so LLMs reason right like they do a really good job of reasoning right and they can reason about different claims and they can say like a thing you know like they can say the sky is blue and they can say the sky is purple right and both claims will have the same authority right the basically it's the LLM said it right of course you can go into the the the the the probabilistic distribution you know and like the quote unquote confidence of each of these claims right but like in in in terms of like what a consumer kind of yeah experiences both of these claims are essentially the same whereas with a vector database you get to peer in right into like what's grounding the response right so if you're essentially saying hey I have a claim here and this claim claim comes from this particular uh database right and that database can have like metadata like that's associated with like for example the author right like the person who created it right like and I can say like what source did it come from right like I can ground it in something that I can then make the decision of whether or not I trust or not right but it's like fundamentally different from just blindly accepting a claim because it's in the training data somewhere right right um and I think that that is like what what kind of makes uh vector databases and drag in general really kind of uh indispensable right um if you want to build like uh trusted AI applications right like you have to ground them in some traceable explainable citation right that will lead you to the source um of that of the of the information that the LLM is using to make any kind of claim so when someone comes to you and says hey um I've got Pinecone set up uh you know it's not performing how I wanted to what is like the this is slight variation on a question Jake asked earlier but I wanted to ask it so I'm gonna ask it um what is kind of the first question that you ask these people when they come to you with that problem well there's there's a bunch there's a bunch of questions to ask right like is again like do you have a good idea of what good looks like right like that is yeah perhaps the most important question right it means that like you have like for example right like you have a a set of ground truth questions right that you know the answer to right yeah and then you can start running an evaluation that says like okay I asked this question did I get the response that I wanted right can you then show me like how the retrieval happened right like what did you actually ask versus what did you actually get right yeah and then what do you do when a question like okay so like you got a a an answer that is wrong right what do you do about that right can you identify like the reason for the failure right and again like sometimes it is because like you get multiple context snippets that are conflicting right and then you get a wrong answer sometimes the information is like not even in the source so you get a complete hallucination right and again a lot of the time people don't know what actually happened right was it a hallucination was it was it a conflict was it something else I don't even know cause I don't have the instrumentation for that right so those are the like the main things right like the the but but like I said again like you you you must have like the the very basic notion of ground truth like what actually am I expecting to see right and then you have to have like enough instrumentation in place to know what could go wrong and if for everyone what did go wrong yeah I like that you mentioned um hallucinations which I think is just interesting cause we're slowly we're slowly getting there where the hallucination problem isn't it's still a problem but it's less of a problem uh I don't think I've ever really been able to understand and I'm curious Roie if you have thoughts or if you maybe even know the answer of why does it happen like why if we have all the supporting data if we're feeding it with you know with rag with relevant data like how do we still get hallucinations yeah so I I heard Andrej Karpathy say and I'm pretty sure it's him and I hope I'm not misattributing this to him that all that the LLM ever does is hallucinate right it doesn't do anything but that right it just so happens that a lot of the time the things that it says are reasonable right interesting and and I think that like that has a lot to do with you know like our expectation and our essentially our trust in the system right so like like I said before in a rag system right if say like an LLM hallucinated something right and it gave me a citation right and the citation was wrong right I immediately have this like you know ability to say okay I'm dismissing this claim because I can see the citation and the citation is wrong right but when an LLM makes a claim with no citation right yeah it could be wrong it could be right right and I couldn't necessarily tell the difference right right yeah and that I think is the most pernicious way like that and I'm you know kind of mess with us right like it's basically like our inability to say with confidence that something is either really right or really wrong right right so this is this is as far as claims go right but the the bigger problem I think is you LLMs do a lot of things that are very useful right like or perceived by us to be useful right and they can like like the question of wrong or right kind of doesn't like doesn't come into into bear so much right because it you know it wrote me a a blog post and the blog post reads really nicely and it's really cool you know what I mean and I'm kind of like with blinders on right yeah and I trust it because it's useful right right and so like again like you're we we we are we are kind of like leaving our critical brains behind right because these tools are so incredibly useful to us right now yeah that's part of like what Pinecone is trying to do um and again by introducing knowledge into the equation right is is again saying there are two pieces right in any kind of uh uh intelligence work that we need to do right one is reasoning right and one is knowledge right the knowledge is the thing that needs to be traceable explainable right trustworthy authoritative right and that means that I could basically trace like any any any particular claim up a stack and say okay like this person said this and that person said it because that other person said it and I trust you know all the way up to the chain right like when I go to like okay there's like this scientific um publication that is trustworthy because it can predict a lot of things correctly and is following the scientific method and that is kind of the roots of all the things that I trust right and that is my knowledge right that is the things that ground me right to reality and then the LLM should reason on over right those things yeah right it should be able to take those things and say and and LLM's do this very well right where they takes take those things and kind of like you know put them together and fuse them etcetera with that said again like what I talked about before it's there's still a lot of responsibility on us to know what we're feeding the LLM right because the LLM is gonna try and do the best that it can right uh to fuse claims together right like if you give them yeah again conflicting claims or worse right slightly conflicting claims right it's gonna just try and make like this mish mash thing right and try and produce the result right so and and and I know there's like a lot of work around explainability etcetera to like kind of understand how the LLM gets to the things that it gets to but it's yeah it's also like how we direct it to do that work right like how do we right what do we expect right these systems to do for us right like and what are we comfortable accepting with no explanation and where are we going to sort of demand right that there's gonna be a lot more grounding and traceability yeah yeah it's less necessary to understand like you know the inner thoughts of the model's mind when you have a a graph of the knowledge that it accessed in the order that it accessed it right yeah I love when you ask a question now I just habitually ask it for the sources that it pulled from right but I love when it just says it didn't have a source for this claim and like well I just made it then please get ha ha ha do not want that no yeah I think I I really like what you said that I don't think I've ever heard anyone say it that way that we trust it because it's useful which I think is exciting but terrifying like I think a lot of it's terrifying like that sums up a lot of the problem is that people trust it because you feel like you can do this magic trick and you wanna show it off you're like I trust like this is so cool I trust it this must be good yeah I mean I think that like you can think of it like a car you know what I mean like if you or an airplane right mmm hmm like and I think you know Jerry Seinfeld made this makes this joke about you know like we're not supposed to be in the air it's kind of like the the same way that fish aren't supposed to drive they just don't belong right but like you trust this crazy idea that like aerodynamics is gonna take like this flying metal tube and fling you up in the sky you know what I mean like and nothing is gonna go wrong and there are like tens of thousands of parts that could go you know it could explode at any moment you know what I mean yeah but you trust it why because it's useful because it's proven to be a thing that moves you from point a to point B without fail right and again as long as you can make forward progress with the thing with the tool right you're gonna trust it more and more and more right even though it could fail in very very catastrophic ways the risk is worth it right yep yeah I like that um so I guess the maybe the takeaway there is stop trusting it because it's useful trust it when it's trust it when you know the source trust it when you know what it's pulling from I mean I think that like the the the tension already exists so for example for financial institutions for legal institutions etcetera right they are slower to adopt LLMs everywhere right and preparing you know for preparing briefs for writing financial reports you know what I mean why because the risk is much bigger right like where do you see like the most LLM or AI gen AI penetration right in all is it's in all of those kind of like lower risk areas right like people are building support bots people are building you know softer things that like OK if it fails it's not a big deal right and that is part of why like the you know like the this you know the the statistic that you that you mentioned before where like you know 90% of all you know AI startups or pilots whatever fail that's kind of because of that right because you can't yet give it the responsibility to do mission critical things why can't you do that it's because people don't trust it why don't people trust it well because it's not trustworthy it didn't it doesn't do enough right to garner trust right and that's the that's the like really the biggest problem that we are we're seeing right now and we're trying to solve yeah yeah so one thing with the the right this is slightly tangential one thing with the rise of agents is that um accessing and writing databases is going to be happening at much higher faster rates than before right just because like an agent can click through things much quicker than than any of us can um have there have you had to account for that at all in designing Pinecone or is that something that uh has not entered your concerns just yet oh for sure I mean I think that like you know it's it's the the probably part of the impetus beyond behind um uh dedicated read nodes right that's like that idea that like read like there's gonna be a lot more reads than writes that is for sure right you're gonna have like a lot more cases where like these agents are gonna try and tap into that information right like that's stored in the back to the database right in one way shape or form like maybe they're making a like a natural language query or whatever like we're trying to do a recommender kind of kind of run right like those those activities are gonna happen a lot more um and that is like one of the scenarios that that kind of uh gave us motivation to to to to push for that feature I really like to hear that cause uh that that was my first question actually when you when you mentioned the dedicated read nodes cause I'm in the process of spinning up an architecture and authorization architecture for you know organization agents for my startup um and immediately like even before I have any of these in production I'm like oh boy I'm gonna have to have like you know a dedicated a dedicated yeah yeah yeah for my permissions database to make sure that you know the uh uh agents can get the permissions that they need um at the speed that they need the good news is that like you know before you get to literally like you know tens of thousands of concurrent requests you probably are not gonna need uh DRN um the real like the the use cases that we're talking about like are you know companies with like legit like millions of users who are trying to clamor onto you know like their their system is that but but like Pinecone itself like even without that particular feature um is fully ready to deal with like very high throughput uh scenarios and again to do that from day 1 right like you don't need to worry about tuning anything etcetera right now I remember back when the uh the very first um oh I can't remember what they called them but the the very first agents that that people started launching there was BabyAGI there was yeah an AutoGPT yeah AutoGPT I remember oh yeah we remember BabyAGI and and we remember those database real quick that was that was like a moment at Pinecone where people were just like uh so basically someone in auto GPT put Pinecone as like the default database and yeah like overnight you know people and we had like a very very very uh uh uh generous free plan back then um it's still generous but it was even more so it was it was it was even more so but to the point where it was like unsustainable like oh my goodness like hold on we cannot deal it was like wait wait it's like I'm curious was that crazy was that a surprise to you guys how how quickly you ran into you know unsustainability issues with yeah back then I think that back then yeah I think that back then that was like the first kind of uh signal that LED to uh serverless right so like I I don't know if you remember but like back then um uh Pinecone was pod based so you basically got a dedicated pod right which is essentially like an instance right in in EC2 or whatever right that was like just yours right yeah and that was not a thing that you could sustain when you have like yeah like tens and hundreds of thousands of people kind of clamoring and starting their own and the problem was not that like they got spun up is like most of them got spun up and then they were used for like a day and then right spun back down or not even spun oh this looks fun let me try it exactly right and so like we had to come up with like a much more fluid um and sort of highly optimized uh um architecture um and that LED to like where we are today um which is the slab architecture that's what I was gonna ask next is has that continued to pay dividends beyond that moment for sure yeah for sure so so like you should read about our our slab architecture is way too long to get into like in this podcast but like like the the basic premises right like that there are when when when you when you move away from like this rigid pod form right like there are a lot of things that you could play around with in terms of how how things scale up and down and like not just that but like you have to understand that like different customers have very very diverse ways of accessing their data um especially with AI I'd imagine that they're even more diverse exactly especially with with with with with vectors right um that require different optimizations in different parts of the architecture right and so like you can't have like this really monolithic way of looking at things right like everything has to essentially basically be tailored right to the customer that is using it and we can't have our hands on it right like it has to all happen automatically right for the for the customer right like and as they right as they use the the system right so that that um I definitely you know that initial uh moment of surprise right like we were like OK it's not gonna work that LED us to like a much a much more uh healthy and scalable place where cogs make a lot more sense and um you know the company is in a much much better position uh to be prepared for what's coming next I love to hear that that's awesome and what do you think is coming next what do you think is a do you have any projections I'm curious your thoughts yeah I mean there is no doubt that we're going to start seeing like much bigger workloads uh coming from autonomous systems um you know whether there we call them agents or something else um but like the idea that like there's not necessarily gonna be a human right driving um a lot of these workloads I do think that like for the foreseeable future um like I I think like two or three years um we're still gonna have like a majority right of humans right driving the bigger workloads coming from our bigger customers right but as you know these autonomous systems become more and more mature right like it's just like like much easier to scale you know you can make more agents than than humans um and so like I think that like we're just gonna start seeing that a lot more and and I think that like we're gonna start seeing new workloads um that go beyond just like you know your your uh uh run of the mill retrieval like for for example memory which we can talk about a little bit you know that's a completely different beast right than anything we've talked about so far right cause then it's like a matter of like again like you could store so many things right like and when you get like new modalities coming in right so like again like I'm thinking more in the in in in terms of like okay what happens when you get um you know robots and AR glasses that are just continuously streaming in like massive amounts of information um and we start thinking about and reasoning about them with world models not just um you know semantic models um what happens then right um and that is like I think the next big uh workload type that we're we're going to see from your perspective and and there may not be an answer here yet but from your perspective I'm curious what would be the difference between like you know agent memory versus uh agent database whether it's vector or you know Postgres yeah I mean I think that there is a lot of correlation in the sense that you know at the end of the day it's gonna be um well some sort of mix of systems but it it it it it it there's gonna be a lot of text that's saved in some way and then retrieved but there's also going to be a lot of non text things that are going to be saved um I'm not sure exactly what they're gonna be I mean they're not necessarily just image embeddings but they're also like embeddings of some internal representation of like a world model that a robot has right for example right like as they like they're navigating a space right like and they're trying to figure out like what did they just see etcetera right um and then there's like the matter of like the the the access patterns right which are again like gonna be vastly different right like for every type of memory right like we can talk about like different types of memory like everybody knows right there's like episodic there's you know right long term blah blah blah like each one of them right is gonna have different access patterns you're gonna have like behaviors that are completely you know they would they would seem counterintuitive to any database where you like need to forget things uh because it's kind of part of building memory um but there's also like other processes that have to come into effect here which are auxiliary to memory which to me are what you would call learning right so like memory is there to support right learning right and that process like is kind of like the orchestrator of like what gets written where what gets forgotten where um you know like all these pieces sort of are are substantially different from like your run of the mill kind of retrieval stuff that we're doing now thankfully these are all really easy problems to solve and super easy they have they have you already solved yeah ready made solutions out there honestly yeah I don't know what the big I don't know what the big deal is we're pretty much there yeah yeah my dad he started a a uh enterprise document retrieval and tracking uh you know company years ago back in very late 90s um and up until recently that's pretty much all he's been working on um and it's just it's always so fascinating to me how deep this problem goes like like it's it's really easy to in our mind underestimate how deep it goes but yeah we're 40 years into you know internet computing and and fundamentally the the the the question is still the same like how do we organize this information how do we access it um yep yep it's wild it is which is so I don't think it can be stressed how important it is like this is the brain that fuels everything this is the the memory this is the reality that we're it's I mean it's very exciting and terrifying I feel like I use those words a lot with AI exciting and terrifying um when you when we get to like we recently had on someone who's um based in Poland his name is Gregory and they're working on building humanoid robotics and they're building trying to be open source models open source models to help these robots interact with the real world and when you think of the we we discussed with him how this hallucination problem we discussed earlier and we're like we better get that figured out before we unleash humanoid robots absolutely absolutely absolutely and I think you're and that's where this comes in yeah I think you're hundred percent right like I think that like I mean I think that that AI has to solve its trust problem you know what I mean um and it goes beyond hallucination right like it's it's it's hallucination is one aspect of it right but it's like yeah it's really more about the relationship that we have um with any autonomous system right and and I honestly think that like so I heard I heard um Judea Pearl say once you know like I think he was speaking to Sam Harris and they were talking about you know like the turning test or whatever and like I'm paraphrasing the conversation here like a lot but like basically it was something along along the lines of you know what does it matter right like if something isn't uh actually conscious if it if it's pretending to be conscious well enough for you to believe it right and yeah I'm I'm of that like I think that like it doesn't really matter right like whether these systems are like actually conscious or just pretending to be conscious right like if there's a walking talking humanoid robot that sort of can sound real and act real enough you know what I mean it's gonna feel right and like you know and and and morally be something that I want to like interact with in the same way that I interact with other human beings and if we can't figure out right like the trust mechanism between us yeah and these machines like that's like where you know the apocalyptic images of the matrix come in right like where this it's like this opaque thing right like and this this us and the machine and we don't know what they think we don't know what they want right like yeah we have to have like this common ground right of how do we how do we um agree on what reality is and how we point to it right like and we make claims about it and we can either falsify or verify those claims yeah this all this all brings me back to my long held belief that uh I I agree with Yann LeCun when he talks about how AI will not really become truly in I'm splitting hairs here but he you know he talks about how AI won't become truly intelligent unless it has inherent grounding um and I like as much as I love LLMs and as much as I want to convince myself that you know we don't need that grounding I think that it's absolutely necessary either for the AI or for us to trust the AI um either way like we don't get to the next kind of level without without that trust mechanism that you're talking about without without disparaging Yann LeCun cause who am I to even talk about him you know what I mean at all I just think that like his his view is very like uh human slash animal centric right like I would agree with that yeah there there is a world where we create a completely foreign type of intelligence and I think we have to some extent right that doesn't look or behave like anything that we know and it doesn't stop it from being intelligent it's just not intelligence intelligent that like like we are or like life forms that we know are but if I were to imagine like you know aliens coming from wherever right they're in they could be intelligent in completely different ways right yeah and it doesn't make them any less intelligent or like that we have any less of a moral obligation towards them or with them etcetera right so apparently similarly I think that people I had this I got in this debate on Twitter the other day I think that people over focus on a metaphysical view of of of thought and don't focus enough on okay what are like you know the mechanical tangible material aspects of thought that can be measured um yes sure maybe LLMs don't quote unquote think but it's like they get close enough that I'm you know what's the difference exactly it's like they're if it's doing all the things that are functionally equivalent to thinking right and and I would go even a step further right like if you take specifically um llms that build code right code is just like a way so like if you think about you know type systems right like as a way to describe the world essentially right yeah um and functions as way to basically describe solutions to problems basically if you have like some input how do you you know how do you produce an output right then to some degree right and I'm not saying like this is like fully AI or anything but to some degree we have an intelligence that can generalize over coding problems meaning it could theoretically generalize over anything that could be formalized into a type system right which is mind boggling right yeah and and we're like oh yeah like this is just on my computer now you know what I mean and we're kind of like no still big you know it's kind of like it's it's it's the same level of insanity of us flying in the air in a in a you know metallic cylinder you know and being like yeah this is totally normal you know what I mean it's it's really not it's really not normal and I think that like we we we are only now right like six months after with year after like it started being a thing only now people like quote unquote normies are seeing like oh shoot like this is actually really interesting and has insane implications on society et cetera et cetera et cetera and we're I don't know at some level we're kind of afraid to call this general intelligence but that to me is the closest thing to it you know what I mean the only thing that is lacking right and I think that like that is not a hard thing not a terribly hard thing to simulate at the very least is willpower yeah you know what I mean like the fact I agree the fact right that right now AI's don't want anything quote unquote is kind of whatever like how hard would it be to again SIM I'm not saying like create like an actual thing that wants thing wants things but actually just simulate it right yeah just like imagine a thing that like pretends as if right it wants things totally and then I would ask like what's the difference if you can't see again Turing test it right like if you can't see the person operating right and what's the actual underlying mechanism behind it wanting things then what's the difference yep I saw this article kind of related to this where they were talking about this crazy phenomenon of artificial relationships right girlfriends boyfriends online and they kind of hinted at what you're talking about that like it's almost like we're achieving this that these a I's are being built and engineered so that they let's say I go get hire this buy this a I girlfriend like that is going to be tuned to like seek my approval it's going to seek whatever makes up a a relationship they're continually tuning those models to try to seek that and build that with you and so it's I think it's like we're already there we just haven't done it generally ha ha yeah I mean this this this throws me back to like I'm a huge Star Trek fan but uh you know like and and I love like the the I mean one of the most famous TNG episodes of The Measure of a man you know like we're at that moment when we're asking ourselves like we're looking at this machine which we all know is a machine right but it's doing all the things that people do right to a certain extent and I acknowledge right like there's still gaps right but like I think those gaps are not conceptually hard to bridge right and then really like the the the the last remaining gap is gonna be a leap of faith and there are some people who are willing to take that leap of faith right and say yep this is like a person I want to be involved with right I'm not there I'm not interested in that I have like I have my reservations on that but like whoever doesn't I can't really like you know judge them cause like they're just taking a leap of faith that I'm not ready to but I'm not take I'm taking other leaps of faith right yeah when I let my chat GPT like write a blog post for me or when I uh you know talk to it about I don't know my bank account or my my or people that I know use it as a therapist or you know things like that right yes we all know it's not a thing but we pretend as if it is and by the way like it's funny cause like we do this thing with all sorts of other things in our lives like money you know what I mean we all know that money is not a real thing but it's rather a it's like a societal convention that keeps society alive you know what I mean yep and that's okay right because it's a useful hallucination that is and it makes it real it make it or it makes it real yeah and this is true for other and this is true for like if you read you read Yuval Noah Harari like he will tell you right like a lot of like you know human history is filled with all sorts of very useful narratives that have cohered in order to facilitate some societal advancement you know what I mean and I don't think this is a lot different from from that totally I'd agree with that yeah sillier example like I have not I used to play the game Destiny 2 a lot have not played it in a number of years I'd feel I'd still feel devastated if I found out tomorrow that my main character is deleted right like yeah yeah and that's just a silly video game Avatar that I haven't touched in years and totally yeah totally we inject ourselves into technology and and blind ourselves to it yeah for sure yeah Roie this has been so fun and yeah same I'm excited to see what's next for you in Pinecone things are moving fast who knows where you'll be in six months or a year might have to have you back on yeah and I would love to come back if people wanna follow you wanna follow Pinecone what what's the best way for them to do that um I think the best ways to find me on LinkedIn these days um and Pinecone just go to our website Pinecone dot io um we have a bunch of very interesting material in our learn section we have an awesome blog um but really most of all like just try the products um and tell us what you think um we have a discord channel that you can that you can join our team is there happy to help with anything that you get stuck with um yeah that's it we'll drop links to everything thank you so much for joining us perfect thanks for having us thanks for we'll stay in contact alright see you