00:01.76 archpodnet Welcome back to the architect podcast episode two thirteen we're talking about the data curation crisis so to speak and so leading off the question I asked you at the end of last segment Paul about you know, can there be ah, essentially a universal container for data. And 1 of the things just kind of like leading up to that that I can bring up is there's there's been a ah ah number of roads converging on the same spot for me in my life as an archaeologist and the first one was getting into digital data collection. Well I guess I guess if you go back even further. It was my first. Few years in archeology doing crm I mean I must have worked in I've I've worked in a total of about 18 or 19 different states and every single one of them has a different site form now out here in the west. It's a little bit. It's a little bit more similar because a lot of the intermountain region ones were based on the same form but they've. 00:52.80 Paul Um. 00:57.28 archpodnet Kind of diverge from there but they kind of all collect the same information still just in different fields which leads me to what I'm about to say here but everywhere you go. There's a different way to record sites Now some of them have to have that because you're not actually recording sites out in the field in some cases for example, shovel Testing. You're really defining sites. Mostly when you get back into the office and you're trying to figure out what you have and you know things like that your your shovel test might initially define a site boundary. But after that you're're kind of you're You're not really filling out a a site form so to speak in the field like you do out here in the west a lot. But that being said, you're still recording the same things. 01:16.51 Paul Listen. 01:34.84 archpodnet Still recording all the same stuff and when I was looking at you know, producing an application and I've gone down this road in 4 different ways and in every single case my latest one being with wildnode a few years ago was trying to create a universal site form I went to the point where I had a whole spreadsheet of. Basically trying to have similar terms on 1 axis like like the actual fields on 1 axis with my universal site form fields coming across the other axis and trying to figure out. Are there any truly unique fields on these site forms that can't be dropped into a universal container and. 02:09.94 Paul Me. 02:12.67 archpodnet in most cases, the answer was no and in most cases the the reason why the site forms were so bespoke is because they just got into incredible detail on the actual site form whereas whereas some were a little more generic and you know is this a feature is it prehistoric or historic and this kind of all they cared about as long as ah, there was a description. But you get to like Utah and Nevada and they want they want a massive amount of detail you know at least on their older site forms. They did the newer ones are a little more succinct which ought to tell you something to begin with, but it was yeah it was all um, leading towards can we make a universal site form that would then start leading us towards this this universal. Data collection and storage method because I I feel like we can't even talk about a universal way to collect and store data without having well a universal way to store data without having a universal way to collect data. You know what? I mean it's ah it's another thing talking about talking about wild note and doing gis and stuff like that. 03:06.41 Paul Um. 03:10.87 archpodnet And understanding things like data dictionaries I mean every company has its own data dictionary for their for their tremble usually right? and it's like why do we all have different ones. Why isn't it all the same one that is like mandated by the agency that you're working with or something like that and some may but in general. 03:17.53 Paul Um, yeah. 03:27.52 archpodnet A lot of companies just have their own and it just doesn't anybody see a problem with that and let me that's a different question but do you think there's that we could do something like a universal site form or would that be too restrictive similar to what I asked you in the last segment. 03:40.27 Paul Yeah I think that my answer is going to be the same as I was what I was saying in the last segment and that either it's going to be too restrictive and you have to then force things into fields that they don't belong kind of like those asterisks on people's names and that's not because that database was restrictive but because. 03:50.89 archpodnet Yeah, yeah, yeah. 03:58.46 Paul The administrators didn't know of and didn't see that date hired field. Um, so in effect it was restrictive. Um, so you either end up with a situation like that and then people are going to be shoehorning data into places. They don't fit in order to have it recorded. 04:00.99 archpodnet M. 04:13.63 Paul Or else like I was saying you you make it. So it's all wide open and then you don't see the fields that you need you know and I've dealt with systems like that. Um, where you know I have to scroll through page after page to find the bit that I actually am recording on this site. 04:20.36 archpodnet Rhett. 04:29.54 archpodnet Oh yeah. 04:31.59 Paul You know because all the other possibilities were listed um you know and neither is perfect and so that's kind of where I'm falling now with the bespoke not being terrible. Um, but maybe there is some intermediate there where it's. You know the whatever federal agency is mandating. It says it needs these things um recorded in this way and whether you're doing an historic site a prehistoric Site. You're doing a survey for who knows what you're doing chevel tests versus you're doing one by ones or you know so on you. Select and say okay, we're going to need Data X Y and Z for this one and a B and C for that one? Oh and they're both going to need Q You know so different forms presented to the people actually collecting the data that strip out the extra garbage but Present. What's absolutely necessary in the most necessary way possible and most visible way possible that kind of feeling at this point. 05:32.45 archpodnet Yeah, well I mean there is I feel like there does have to be a and and think people inherently know this, but there is an acknowledgement between collecting data and analyzing and interpreting data right there. There's definitely a big difference there and. 05:47.40 Paul Yeah. 05:51.81 archpodnet It's leading me to think about another piece of software that I work with kind of my kind of my new day job so to speak and has nothing to do with archeology. But we've got a new feature. Let me just ah, let me just give you an an example here we we build Modules Basically that that handle a concept. So let's say you've got. 05:57.82 Paul Um, yeah. 06:10.60 archpodnet 1 module that's doing 1 thing in general. It's about the same for everybody. It's like recording a site. It's about the same for everybody with some slight differences. Well there's this new Ai summary feature where it will basically take all of the forms that were put together for this one record we call it and there could be any number of subforms and pieces of data and. 06:17.73 Paul Um. 06:29.37 archpodnet And different ways different companies have rephrased things and and done stuff to to just fit their own needs very similar to what we're talking about but the Ai summary just looks from beginning to end and summarizes this entire record in human readable terms and it'll mention for example, first names in. 06:33.70 Paul Um. 06:46.57 archpodnet The first paragraph and maybe a few paragraphs down. It'll just go down to first name it gets very conversational very familiar and it just puts it in this really easy to understand shockingly good summary of the entire record down into about 4 or 5 different paragraphs and. I'm kind of wondering man if we just had a universal way to collect everything you know in a way that didn't limit what we wanted to say about it. Um is still a loud description still allowed things like that and then instead of doing you know I don't know instead of doing some sort of overly academic. Complicated analysis just see what a I can do with it just say hey have a summary button I'm I'm half tempted to make an archeology module out of this software because that's what I do is I configure it and just to see how it would interpret certain. 07:27.86 Paul Um. 07:38.57 archpodnet Archeological sites if data were put in there right? just to see what it would say I wouldn't be able to interpret in context of maybe everything else that's been collected out there because most of that is behind some kind of wall that you know you could use chat Gp all day long, but it doesn't know anything when it can't get to the actual data right? So um, you know it's only as good as what's in there. But. 07:46.61 Paul Never. 07:56.75 archpodnet Just wonder if our future lies somewhere down that road right? where we really are archaeology data scientists and we're just collecting data to put into this database that will be analyzed interpreted from a much bigger standpoint in a way that we never could. 08:11.11 Paul Possibly I mean I'm actually thinking it the other way around I Wonder if ai could be used to help streamline that process so that you're only collecting the the necessary data and you can do it in the easiest way possible and now though the backstory to this is that my dad um was a physician. 08:19.10 archpodnet M. 08:30.98 Paul And in the 1990 s who's working with a large pharmaceutical company to develop. Um, what we call back then an expert system um in order to collect data on diabetes patients. Um and to monitor them with their their meds their weight their you know a lot of different. 08:37.64 archpodnet Oh. Yeah. 08:50.29 Paul Factors ah around their health and so that database that expert system could be distributed to ah to rural doctors to general practitioners to people who weren't androrinologists who weren't specialists in diabetes care in order to make sure that things didn't fall through the cracks. Um, and. Yeah, so that's kind of the same sort of thing that I'm thinking about now with archeology. Maybe you could use Ai to generate those I'll still call it expert system. So I'm sure there's a different term of art now. Um in order to help. 09:23.53 Paul Streamline and make more efficient and make more comfortable I mean I actually hate recording data in the field it it always stresses me out I mean even the simple stuff like one cellll stresses me out to no end because I never see the same colors any of the damn chips you know, but if if I could have some way of. 09:28.28 archpodnet Um. 09:37.23 archpodnet Red red. 09:42.98 Paul Being getting the right prompts and being prodded along and making the decision. You know it's it's ten y r 4 3 not 4 4 um that would hit that would definitely help me. Um. 09:51.36 archpodnet Yeah. 09:56.59 Paul So You know? maybe we do it at both ends I mean I'm sure that's going to be happening right? I'm sure that you're talking about for the analysis part is is starting to happen already. Um or at least a synthesis if not not not full on analysis at least the ah the turning it into something that somebody else can read and understand the basics of you know, like reading the. The summary or the abstract of an article you know, ah that that that could be useful and then having it the at the start end for helping collect the data. Um that that's an interesting I don't know where we're going to go with that. But I'm sure that that's going to be worked On. Um. 10:17.82 archpodnet Yeah. 10:32.31 archpodnet Yeah I have no doubt god. 10:37.18 Paul So another thing before we round out that I just wanted to bring up here is that you know something that that runs through a lot of these articles is um that we have an ethical obligation to share collections and data beyond just our profession and that gets to your question here about the ai um and. 1 of the articles starts out pointing out the the obvious I mean it's it's anther or archaeology one to 1 is that what separates archeologists from um, from looters and tomb robbers and so on is ah just you know pot hunters in general is that? ah. We record things and because we record things and we record things in as much detail as possible. Um, we've kind of dug our own hole. You know, ah in a good way. I mean I'm glad that we're recording so much but I'd rather have it that way than the other way around. 11:28.52 archpodnet Um, yeah. 11:31.80 Paul But we hate to absolutely have that ethical responsibility to think through what the rest of that means and then another related thread is that the um it gets brought up 1 or both of these the fair and care principles. Um, let me quickly pull open a web page every here. So. Fair is findable, accessible, interoperable and reusable and care is collective benefit authority to control responsibility and ethics and um, I'm glad that those have worked their way into um into a lot of our discussions. It's in fact, it was one of the things we had to. 11:51.27 archpodnet Oh. 12:07.69 Paul Say in the and Nsf proposal was hey how is our data management plan going to address these you know and so it's becoming foregrounded in a way that it wasn't in the past and I think that's only a good thing for our field. How do we deal with this deluge um and you know, ah unfortunately I don't have good answers. And at least none of the articles I read do I recall there being good answers. It be a lot of good like proposals things we should be doing things that we really should be doing better but not ah, not a like a ones size fits all hey you know if you just do x you will suddenly have. Good, reusable well curated data. It's it's much more complex than that. Do you have any ideas Chris of ah you know aside from holding out the hope that ai fixes it for us of how we might start addressing these things. 12:50.36 archpodnet Right. 12:57.68 archpodnet Ah, well I'm not sure there. There is an answer to be honest with you only because we have so many different places going back to the one of the very first things I said and and that I actually read in this issue. Was. There's so many places to actually store the data so many agencies that are mandating the collection in a very certain way and for for all this to happen I mean we would need ah almost a federal level understanding of hey you're going to collect and store things this way and then even that breaks down when you talk about. You know, private projects in say California and places like that. It's just it's just almost no way to do it. So I think rather than having Ai summarize like ah like a site record or something like that like I was mentioning I think Ai would probably come in. 13:49.28 Paul Um. 13:50.12 archpodnet Come into play and actually just analyzing these datasets. You know so you can query the dataset with an ai rather than say with an actual database right? and you just say hey is there anything and you say it in human human readable terms was there any found that. 13:52.64 Paul Oh. 13:58.49 Paul Are. 14:05.29 archpodnet Anything found. That's similar to what I've got over here I mean you literally should be able to just say that right because it can understand what you've gotten and understand what's in the other thing and then just like let you know if you need to dig any deeper right? because I I don't think we're I don't think we're going to get to a point in a really long time where everything is collected in a way that. 14:09.50 Paul Um, right. 14:24.43 archpodnet Is universally searchable right? and and understandable by people let alone all the past collections that we'd have to worry about those are those are gone unless something else can can organize them. It's just you know I think that's where we're headed I think that's where we'll have to be headed if we want to understand large data. 14:32.26 Paul Um. 14:40.96 archpodnet And and how all that's working I think Ai is going to solve that problem for us. 14:45.62 Paul Well I think we're have to wrap this up I could probably go for another hour. But um, yeah, but you know and I have I'm looking at our our notes here and I've got ah more things still unchecked that I would like to talk about than checked. Ah. 14:58.70 archpodnet Right. 15:01.55 Paul But so maybe we'll revisit this and and hit some of these other ones in in an upcomping episode but there are some interesting things that I learned in reading these articles that I would like to discuss with you some more. 15:13.77 archpodnet Yeah, indeed me too and maybe we can get our producer to take a look at this and and see maybe we can bring a few of these people on to talk about these subjects further because ah just looking at the names and the of the people who have written some of these articles we've interviewed. At least a handful of them I recognize at least 4 or 5 different names on here that I know that we've interviewed at least on this show if not on this show then on other shows as well and so I think we can. We can get some people on here to talk about this josh wells I was just emailing with him. Yeah, so. 15:29.19 Paul Um, yeah. 15:40.30 Paul Yeah I do her. But yeah I'd always rather have you know experts talking about their work and their understanding than me bloviating. 15:51.63 archpodnet Yeah, me too. Yeah, if only Ai would just summarize my thoughts I can just blather on a podcast and then have something intelligent come out there. You go ah all right? Well thanks Paul it's good to have you back on and hopefully we can. Ah hopefully we can keep this going this time. Ah. 15:59.79 Paul There you go. Ah. 16:03.18 Paul Thanks Chris good be back all right, take care. Thank you bye. 16:09.34 archpodnet Right? But thanks a lot everybody bye.