2/4/2023: Regular Expression Search to Get the Line Count in a Text File
Running the regular expression search:
\S+\s*$
. . . in a text editor like NotePad++ will generate a count of the number of non-empty lines in a text file. The number of hits will be shown in the results pane below the transcript.
In order to get a count of the number of empty lines you can run this RegEx search: ^\s*$
2/11/2023: Getting Litigation Support Tips From ChatGPT
The open source artificial intelligence chatbot ChatGPT has been in the news quite a bit since it was released late last year. Whether you work with computers for your job or not, it’s really something that you need to check out. If you’re familiar with the range of assistance that AI devices like Alexa or Google Assistant can provide, you’ll be in for quite a surprise at how much better ChatGPT is. It’s really a game changer, and is pretty close to a substitute for attempting to find answers to problems by running Google searches.
You can create an account for ChatGPT and begin using it very quickly. See this link: https://chat.openai.com/auth/login
It does require that you provide a phone number, and comes with the disclaimer, “the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice.” It has limited knowledge of the world and events after 2021. The information that you provide it with will not be protected from disclosure.
I decided to begin testing ChatGPT usefulness to litigation support professionals by asking this question: “Can you explain the electronic discovery reference model to me?”
That’s not a bad response, but including identification, collection, and preservation as part of information management is an odd choice, and I’d really wonder about how experienced someone was in electronic discovery if they chose to answer this way. It also would have been more impressive if ChatGPT provided an image of the EDRM.
I next decided to ask ChatGPT a more specific technical question. I wasn’t really expecting it to provide a good response to a request for even a basic Regex script, but it surprised me.
There’s no doubt about it, ChatGPT’s answer, \b\d{9}\b , does work.
My next question asked the AI bot to craft a more complicated regular expression search:
It even gives the option to copy the Python code. I engaged with ChatGPT further asking for guidance on how to actually run the Python script, and it didn’t let me down.
This is correct, although figuring out how to get Windows to recognize Python in Command Prompt after I installed it was a whole other problem that I decided to work out on my own. The solution should be part of an upcoming tip of the night. After I got Python to be recognized in Command Prompt I was able to successfully test out ChatGPT’s script.
I don’t like to promote the replacement of litigation support professionals, or this blog for that matter, but there’s just no ignoring what a useful tool this is. I think we’re at an inflection point. It will not be possible to work without AI tools like ChatGPT going forward, any more than it was possible to ignore the need for internet service in the late 90s, or the need for a smartphone after the introduction of the iPhone by Apple in 2007.
ChatGPT generated answers to my questions in a few seconds, and archived its responses. I can’t imagine anyone in our field doesn’t regularly expand upon their knowledge of Excel, document review platforms, or electronic discovery in general by running Google searches. Now, you’ll be doing this with an AI tool that will provide more specific answers to your questions than Google or the whole world wide web ever could.
2/18/23: Regex Search to Find Files Not Containing a String
You can run a regular expression search which will find any files which do not contain a word or phrase that you designate.
In this example we have a set of text files, only one of which contains the word, ‘panda’.
Using a text editor like NotePad ++ we run the following search to locate any files which do not have this string:
(?s)\A((?!panda).)+\z
. . . use the option to Find All using the Find in Files tool, and only the files in the folder in which you search which do not contain the string will be listed in the results pane.
2/25/2023: AWS Kinesis Data Firehose
Kinesis Data Firehose is an Amazon Web Services data transfer service which can move streaming data to data storage. It will extract data, transform it, and load data in the cloud. When data must be imported from multiple sources that are generating data in realtime and ingested for processing before ending up in a data store, Kinesis Data Firehose acts as the conduit. So as data is generated by user inputs on the web (clickstream data), enterprise applications, smartphones, or other sources KDF can compress and encrypt the data before transferring the data to Amazon cloud storage, such as an Amazon simple storage service (S3) bucket or an outside location.
KDF can transfer the data at several GBs per second, and will backup the data at three different locations in an AWS cloud data region. (Amazon has 6 regions in the United States – Northern Virginia; Ohio; Northern California; Oregon; US-East; and US-West). AWS only charges for the data that is ingested via KDF – there is no minimum fee.
KDF may convert JSON data to other formats in order to save storage space, and it can also convert .csv files or data from structured databases to the JSON format.
So for example financial transaction data can be sent automatically through KDF to a S3 bucket in its raw state, or KDF can transform the data , summing up sales to a particular entity, and then store the data in the bucket.
KDF is designed to ensure that data collected on servers from hundreds of sources does not get lost when the servers go down. It will scale up as more streaming data is generated. Data is actively backed up in the cloud as it is generated.