Blog

Automate PDF Data Extraction using Gemini 1.5 Flash and Google Apps Script

    Small to mid-size businesses receive PDF invoices from vendors and need to store details like invoice number, date of invoice, total amount, and vendor name. Doing this manually for all your vendors is a tedious and time-consuming process. In this blog we are going to be automating the process of extracting details from PDFs and populating the Google Sheet using the Gemini 1.5 Flash model and Google Apps Script.

    The Problem with Manual Invoice Processing

    Manually processing invoices is a tedious and inefficient task. It involves poring over countless PDF documents, meticulously extracting data, and entering it into spreadsheets or databases. This process is not only time-consuming but also prone to human errors, which can lead to costly consequences.

    Imagine a scenario where your accounts payable team spends hours each week manually processing invoices. This not only delays payments but also increases the risk of errors, potentially damaging relationships with your vendors. Moreover, manual data entry ties up your team’s time, preventing them from focusing on more strategic tasks that drive business growth.

    What can this Automation do?

    This function is designed to automate the process of manual data entry inside of Google Sheets. This function uses the Gemini 1.5 Flash model to extract required/ specific details from the Google Sheet and populate the Google Sheet. Here’s a high-level overview of how the solution works:

    1. PDF Conversion: The Google Apps Script converts your PDF invoices into a format that Gemini 1.5 Flash can easily understand.
    2. Data Extraction: The script interacts with Gemini 1.5 Flash, providing it with the converted invoice data and instructions on what information to extract.
    3. Spreadsheet Update: The extracted data is seamlessly populated into a Google Sheet, ready for further processing or analysis.

    Now that we have understood the problem and the possible solution, let’s get coding.

    Sample Google Sheet

    The Google Sheet that I will be using for this blog contains the following details:

    Write the Automation Script

    While you are in the Google Sheet, let’s open up the Script Editor to write some Google Apps Script. To open the Script Editor, follow these steps:

    Click on Extensions and open Apps Script:

    This brings up the Script Editor:

    Now, enter the code:

    function processPdfToSheet() {
     var archiveFolderId = "YOUR ARCHIVE FOLDER ID"; 
     const folderId1 = "YOUR FOLDER ID";
     var folder = DriveApp.getFolderById(folderId1);
     var files = folder.getFiles();
    
     while (files.hasNext()) {
       var file = files.next();
       if (file.getMimeType() === MimeType.PDF) { // Filter PDF files
         var fileId = file.getId();
         var pdfContent = convertPdfToGoogleDoc(fileId, folder);
         var responseData = sendToGemini(pdfContent);
         var details = extractFields(responseData);
    
         // Update Google Sheet with extracted details
         updateSheet(details);
    
         // Move the original PDF and the converted Google Doc to the archive folder
         var archiveFolder = DriveApp.getFolderById(archiveFolderId);
         moveFileToArchive(file, archiveFolder);
       }
     }
    }

    The convertPdfToGoogleDoc() function processes a PDF file by retrieving its content, creating a new Google Doc with extracted text, and then deleting the converted Google Doc to avoid clutter. We use the Drive API and DocumentApp to perform these operations, ensuring accurate text extraction. 

    We then return the extracted text content, which can be further processed or used in applications.

    function sendToGemini(pdfData) {
     const GEMINI_KEY = 'YOUR_GEMINI_KEY';
     const GEMINI_ENDPOINT = `https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=${GEMINI_KEY}`;
     var headers = {
       "Content-Type": "application/json",
       "Accept": "application/json"
     };
     var requestBody = {
       "contents": [
         {
           "parts": [
             {
               "text": `extract the following details: Vendor Name: Invoice Number: Amount Due: Due Date: Description Tax: \n${pdfData}`
             }
           ]
         }
       ]
     };
     var options = {
       "method": "POST",
       "headers": headers,
       "payload": JSON.stringify(requestBody)
     };
     try {
       var response = UrlFetchApp.fetch(GEMINI_ENDPOINT, options);
       var datanew = JSON.parse(response.getContentText());
       return datanew;
     } catch (error) {
       Logger.log('Error calling Gemini API: ' + error);
       return null;
     }
    }

    We then use the sendToGemini() function to interact with Gemini 1.5 Flash to process and extract data from the text content. We construct a request that contains our prompt that specifies the details we want to extract from the content and then send the request to Gemini 1.5 Flash. 

    In case of errors, it logs an error message and returns null. If successful, it returns the extracted details that we received from Gemini 1.5 Flash. 

    function extractFields(datanew) {
     if (!datanew || !datanew.candidates || !datanew.candidates.length) {
       Logger.log('No valid data returned from Gemini.');
       return {};
     }
    
     var textContent = datanew.candidates[0].content.parts[0].text;
     textContent = textContent.replace(/- /g, '').trim();
     var lines = textContent.split('\n');
    
     var details = {};
     lines.forEach(function (line) {
       var parts = line.split(':');
       if (parts.length === 2) {
         var key = parts[0].replace(/\*\*/g, '').trim();
         var value = parts[1].replace(/\*\*/g, '').trim();
         details[key] = value;
       }
     });
    
     return details;
    }

    We use the extractFields() function to process the response from Gemini 1.5 Flash to extract relevant details. We check the validity of the response, extract the text content, split it into lines and iterate through each line to identify key-value pairs. 

    These extracted details, such as Vendor Name, Invoice Number, and others, are returned in a structured object for further use.

    function updateSheet(details) {
     var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Invoices");
     var range = sheet.getDataRange();
     var values = range.getValues();
    
     var vendorName = details['Vendor Name'];
     var nameFound = false;
    
     var currentDate = Utilities.formatDate(new Date(), Session.getScriptTimeZone(), 'MM/dd/yy');
     var formattedDateTime = Utilities.formatDate(new Date(), Session.getScriptTimeZone(), "yyyy-MM-dd HH:mm:ss");
    
     for (var i = 1; i < values.length; i++) {
       if (values[i][2].toLowerCase() === vendorName.toLowerCase()) { // Compare by Vendor Name
         nameFound = true;
         sheet.getRange(i + 1, 1).setValue(details['Invoice Number']); // Column A
         sheet.getRange(i + 1, 6).setValue(details['Amount Due']); // Column F
         sheet.getRange(i + 1, 7).setValue(details['Due Date']); // Column G
         sheet.getRange(i + 1, 9).setValue("Last updated at: " + formattedDateTime); // Column I
         Logger.log("Updated Row " + (i + 1));
         break;
       }
     }
    
     if (!nameFound) {
       Logger.log("Vendor not found: " + vendorName);
       var newRow = values.length + 1;
    
       sheet.getRange(newRow, 1).setValue(details['Invoice Number']); // Column A
       sheet.getRange(newRow, 3).setValue(vendorName); // Column C
       sheet.getRange(newRow, 4).setValue(details['Description']); // Column D
       sheet.getRange(newRow, 5).setValue(vendorName); // Column E
       sheet.getRange(newRow, 6).setValue(details['Amount Due']); // Column F
       sheet.getRange(newRow, 7).setValue(details['Due Date']); // Column G
       sheet.getRange(i + 1, 9).setValue("Last updated at: " + formattedDateTime); // Column I
       Logger.log("New Row Added");
     }
    }

    The updateSheet() function is responsible for managing entries within the Google Sheet. It first checks if the client or vendor already exists in the sheet. If a matching entry is found, the function updates the existing record with the new details. However, if no existing record is found, a new entry is created to store the information.

    function moveFileToArchive(file, archiveFolder) {
     file.moveTo(archiveFolder);
    }

    The moveFileToArchive() function is a simple help function that takes the file and moves it to the archive folder after the data extraction and the Google Sheet is updated. 

    Our code is complete and good to go.

    Check the Output

    It’s time to see if the code is able to access the invoices, extract details and update the Google Sheet successfully. 

    Conclusion

    In today’s digital age, automation is key to staying competitive. By harnessing the power of Gemini 1.5 Flash and Google Apps Script, you can transform your invoice processing workflow, saving time, reducing errors, and empowering your team to focus on what truly matters.

    We encourage you to explore the possibilities of this solution and discover how it can revolutionize your data extraction processes. Remember, automation is not just about efficiency; it’s about unlocking new opportunities for growth and innovation.

    To see the full code in one place, click here.

    Read article

    Jason Murray-Rosel Promoted to Head of Client Success at Premier Cloud

      VICTORIA, BC, September 10, 2024 – Premier Cloud is excited to announce the promotion of Jason Murray-Rosel to Head of Client Success. Previously serving as Client Success Manager, Jason will now lead the Client Success team, focused on helping clients unlock the full potential of Google Cloud.

      In his new role, Jason will deliver strategies to ensure clients optimize their Google Cloud investments, enhancing scalability, productivity, and overall business performance. With a proven track record of driving results and ensuring customer satisfaction, Jason has been pivotal as the primary point of contact for clients. Under his leadership, the client success team will continue to strengthen Premier Cloud’s client-centric approach, further maximizing the value clients gain from our cloud services and solutions.

      Jason’s promotion is a reflection of his dedication, expertise, and ability to guide clients through their cloud journeys. As Premier Cloud continues to grow and expand its services, Jason’s role will be crucial in ensuring that clients not only adopt Google Cloud effectively but also use it to its fullest potential to drive growth and efficiency.

      Read article

      Premier Cloud and Lumin Partner to Streamline Productivity in Google Workspace

        Victoria, BC, August 22, 2024 — Premier Cloud Inc. is proud to announce its newly established partnership with Lumin, a cloud-based e-Signature & document management solution, to elevate the efficiency and functionality of Google Workspace for its clients.

        Lumin, known for its seamless integration with Google Workspace, offers a robust suite of tools designed to streamline document workflows. These tools enable real-time collaboration, secure storage, and advanced document editing directly within the Google ecosystem.

        Premier Cloud, a leading provider of Google Cloud services, recognizes the importance of meeting the evolving needs of modern businesses. With this partnership, Premier Cloud reinforces its commitment to delivering comprehensive cloud solutions that enhance business operations across various sectors.

        Key features of Lumin’s integration with Google Workspace include:

        • Efficient eSignature management
        • Real-time collaboration for improved workflow
        • Merging of multiple documents into a single PDF
        • Direct text editing within PDF files
        • Scanning capabilities that ensure documents are searchable and readable

        This partnership underscores Premier Cloud’s dedication to innovation and customer success by providing tools that drive productivity and simplify document management.

        For more information, please contact: 
        Alex Shahbazfar
        alex.s@premiercloud.com 
        888-866-2230

        Read article

        Premier Cloud ranked 8th place on CRN’s 2024 Fast Growth List

          VICTORIA, BC. August 7, 2024 — Premier Cloud Inc. is thrilled to announce its 8th-place ranking on CRN’s 2024 Fast Growth 150 list. This accomplishment affirms Premier Cloud as a leading cloud solutions provider in North America, highlighted by an impressive two-year growth rate of 242%.

          The CRN Fast Growth 150 list recognizes the fastest-growing technology solution providers in North America. Rankings are based on the two-year growth rate of companies with gross annual sales of at least $1 million. Our notable achievements in Data Analytics, Infrastructure Services, and Cloud Migrations have been pivotal in driving our success and earning this recognition.

          Premier Cloud’s business revolves around delivering bespoke solutions, tailored to address the unique business challenges and needs of our clients. Our client-centric approach is at the heart of our growth strategy, ensuring we consistently meet and exceed our clients’ expectations with each interaction. The company’s focus on client satisfaction is reflected in its exceptional client churn rate of less than 1%. 

          Looking forward, we are excited to continue our journey of growth and advancement while further strengthening our relationships with clients and partners. We remain committed to delivering top-notch solutions that drive business success and transformation.

          For more information, please contact: 
          Alex Shahbazfar
          alex.s@premiercloud.com 
          888-866-2230

          Read article

          Google I/O 2024 Recap: Unleashing the Power of AI with Vertex AI and Trillium

            The recent Google I/O conference saw Vertex AI, Google Cloud’s platform for building and deploying AI solutions, take center stage with a series of exciting announcements. These advancements empower developers and businesses to leverage the latest AI innovations, streamlining development processes and unlocking the true potential of AI.

            Supercharged Development with Powerful New Models

            Vertex AI welcomes a new member to its impressive lineup of large language models (LLMs): Gemini 1.5 Flash. This powerhouse boasts a 1-million-token context window, enabling applications like chatbots and virtual assistants to maintain a comprehensive understanding of conversation history. Imagine a virtual assistant that remembers every detail you’ve discussed, leading to more natural and personalized interactions.

            But power doesn’t have to come at the expense of efficiency. Gemini 1.5 Flash is specifically optimized for real-time applications, offering lightning-fast response times and smooth user experiences. This makes it ideal for situations like chat support or integrating AI into dynamic web applications.

            Beyond text-based interactions, Vertex AI introduces PaliGemma, a groundbreaking open-source visual-language model. PaliGemma excels at understanding and describing visual content, making it a powerful tool for tasks like image captioning, product image classification, and content moderation. E-commerce platforms can leverage PaliGemma to automatically generate engaging captions for product images, enhancing user experience and product discoverability.

            Streamlining Workflows and Reducing Costs

            Vertex AI goes beyond just providing powerful models. It also offers a suite of features designed to streamline development workflows and reduce costs. Here are some of the key highlights:

            • Context Caching: Managing the vast amount of data processed by models like Gemini 1.5 Flash can be expensive. Context caching tackles this challenge by intelligently storing and reusing relevant context data, significantly reducing processing costs for tasks requiring long context windows.
            • Controlled Generation: Fine-tuning AI outputs is crucial for real-world applications. Controlled generation allows developers to guide the model’s output by specifying desired attributes. This ensures the generated content, whether text or image captions, aligns perfectly with your needs.
            • Batch API: Speed up large-scale operations with the new batch API. This feature lets you submit multiple requests simultaneously, significantly improving processing efficiency for tasks involving large datasets.

            Powering the Future with Cutting-edge Hardware

            The foundation for all these advancements lies in Google’s cutting-edge hardware, specifically the 6th generation of Tensor Processing Units (TPUs) codenamed Trillium. This powerhouse boasts a staggering 4x performance boost over its predecessor. This translates to faster training times for your Vertex AI models, more efficient inference during deployment, and the ability to handle even more complex models in the future.

            Think of Trillium as the engine that propels Vertex AI forward. With its unparalleled performance, Trillium underpins the development of next-generation AI models within the Vertex AI platform, paving the way for groundbreaking applications across various industries.

            Conclusion

            The announcements at Google I/O solidify Vertex AI’s position as a leading platform for building and deploying powerful AI solutions. With a growing library of models like Gemini 1.5 Flash and PaliGemma, coupled with development-friendly features and the raw power of Trillium, Vertex AI empowers businesses to unlock the full potential of AI and deliver exceptional customer experiences.

            Read article

            Why Work with a Google Cloud Partner

              Partnering with a Google Cloud Premier Partner offers undeniable benefits for businesses that aim to leverage cloud solutions effectively. This strategic collaboration ensures access to certified expertise and a range of specialized services and aligns with Google Cloud’s commitment to involve partners in 100% of customer engagements, thus enhancing service delivery and customer success. Here’s why leveraging the expertise of a Google Cloud Premier Partner is crucial for maximizing your cloud investment and achieving successful digital transformation.

              Certified Expertise: Google Cloud Premier Partners possess a high level of certified expertise, ensuring they are well-equipped to handle complex cloud solutions. With a team of Google-certified professionals, partners like Premier Cloud provide specialized knowledge in infrastructure projects and data migrations, which is crucial for optimizing and securing your cloud environment.

              Enhanced Efficiency and Innovation: Engaging with a Premier Partner grants access to exclusive Google Cloud resources and tools. This partnership ensures that your cloud solutions are implemented using best practices and innovative frameworks that are robust and tailor-made to fit your business needs. This direct line to Google also includes advanced training and support, keeping your setups on the cutting edge.

              Cost-Effective Solutions: With flexible billing and contracting options available through a Google Cloud Premier Partner, your business can enjoy enhanced financial flexibility. Premier Partners have the ability to offer exclusive discounts on Google Cloud and Workspace licenses, reducing costs without sacrificing the quality or scope of services. It’s like receiving a tailored discount that aligns perfectly with your usage and budget requirements.

              Comprehensive Support: Google Cloud Premier Partners provide end-to-end support from setup and migration to ongoing management. This includes access to professional services from certified architects, proof of concept programs, and troubleshooting support. Partners ensure that your cloud environment is up and running and optimized for performance, security, and compliance.

              Strategic Business Advantages: Working with a Premier Partner means more than just technical expertise; it’s about strategic business transformation. Google Cloud Premier Partners are equipped to handle infrastructure projects and data migrations that can scale according to business growth, driving more value from your investment. They also provide ongoing insights and analytics to refine and adjust strategies, ensuring your cloud solutions evolve with your business needs.

              Choosing to work with a Google Cloud Premier Partner is more than an investment in cloud technology—it’s a partnership that fosters business growth, enhances technological capabilities, and ensures a competitive advantage in a digital-first world. By aligning with Google Cloud’s strategy to integrate partners fully, businesses can expect a holistic and enriched cloud experience that is tailored to their specific needs and goals. With such a partnership, companies are well-positioned to accelerate their cloud initiatives, ensuring they are both future-ready and competitive in the digital landscape.

              Read article
              Why Apigee is the Leading Solution for API Management for Businesses

              Beat the clock: Consolidate UA-3 Data Before the July 1, 2024

                In March 2022, Google announced the retirement of Universal Analytics, setting a timeline for businesses to transition to Google Analytics 4. Universal Analytics properties stopped processing new data from July 1, 2023, and access to historical data will not be available after July 1, 2024. This makes it imperative for users to take ownership of their data using BigQuery and Looker.

                Why is it Important to Migrate Your UA-3 Data

                If you don’t migrate your data, you will lose access to your historical data. This service ensures the preservation of historical analytics data that will no longer be available after the cutoff date. With a focus on maintaining data ownership and providing advanced analysis through BigQuery and Looker, Premier Cloud supports strategic business decision-making during this pivotal analytics shift.

                What is Premier Cloud’s UA-3 Data Migration Service?

                Our UA-3 data migration service can preserve your UA-3 data. We do this by extracting your Universal Analytics data to store them in BigQuery, and using Looker to visualize reports in a tailored but familiar manner.

                Benefits of Using Premier Cloud’s UA-3 Data Migration Service

                1. Data Migration & Preservation: Securing and transferring data to GA4, ensuring that historical analytics remain accessible and actionable.
                2. Customized Analytics Solutions: Utilizing Looker, the service offers tailored data models, reports, and dashboards to meet analytical needs.
                3. Ownership and Control Over Data: Providing direct access to BigQuery allows businesses to retain control over their data, supporting informed decisions and compliance with data security standards.

                Learn more about Premier Cloud’s UA-3 Data Migration Service

                To learn more about, and get started with Premier Cloud’s UA-3 Data Migration Service, go to: https://premiercloud.com/ua3-to-ga4-service/

                Read article

                Top 5 Recent Updates to Google Workspace

                  Google Workspace has introduced some useful updates! Let’s explore the latest enhancements across Google Chat, Sheets, Drive, Meet, and more as they aim to simplify your work processes and make them more efficient.

                  1. Identify with ease which unread threads are most relevant to you on Google Chat

                  Building on last year’s introduction of in-line threading for all new spaces, this feature allows for replying directly to messages, facilitating focused discussions separate from the main chat flow. Additionally, there are new integrated functionalities—home, mentions, and starred messages—to expedite navigation and ensure you can quickly locate essential conversations in Chat. Seeing the participant avatars for unread threads makes it easier to identify whether or not a thread is relevant to you. 

                  2. Google Sheets now has smooth scrolling on desktop.

                  Smooth scrolling enables you to navigate seamlessly to any section of a row or column in the spreadsheet, allowing for precise stops at your desired location. This enhancement is compatible with all Sheets functionalities, including frozen rows, concealed columns, charts, buttons, comments, within right-to-left layouts, and is fully operational with scrollbars as well as mouse or trackpad inputs.

                  3. Improving Navigation with the Google Drive application

                  Google is enhancing the search capabilities in the Google Drive app on iOS platforms (with an Android update anticipated shortly) through several key updates:

                  1. Streamlining Access to Query Filters: We’ve introduced categorized filters directly beneath the search bar for more straightforward access. These categories include filters for File Type, Owners, and Last Modified dates.
                  2. Displaying Contextual Query Filters: As you type your search query, relevant filters will appear, allowing for easy selection. This feature saves you from the need to type out entire searches, streamlining the process.
                  3. Enabling Further Refinement Post-Search: After arriving at the search results page, you’ll have the option to refine your search further for more precise outcomes.

                  4. Conduct 1:1 calls on Google Meet with people outside your video calling network

                  Earlier this year, Google introduced cloud-encrypted one-on-one video calls within the same domain through the Meet mobile app. They are now extending this capability to include users from outside your domain, allowing them to initiate one-on-one video calls with your users and vice versa. A new admin control will be introduced, enabling admins to pre-configure this feature as either enabled or disabled for their users.

                  5. Import and convert sensitive Excel files using client-side encryption

                  This feature empowers users with full control over their encryption keys and the choice of identity service for key authentication, facilitating seamless collaboration between Google Sheets and Microsoft Excel for both external and internal stakeholders.

                  Important features of the update:

                  • Support is limited to .xlsx Excel file types.
                  • Other Excel and tabular file types are not compatible.
                  • Features from Excel that are unsupported in Sheets will not be imported.
                  • The import is capped at files sizes up to 100MB.
                  • Up to 10 million cells can be imported from each file.
                  Read article

                  All the latest

                  Take your cloud journey to the next level

                  Newsletter
                  Subscribe for the latest updates from Premier Cloud, including new products, services, and exclusive offers.

                  Follow our Blog

                  |

                  © 2024 Premier Cloud Inc. All Rights Reserved.

                  We use cookies to deliver services and analyze traffic.