What Is Automation In Banking And How Has USA Used It To Grow Its Economy?

Introduction

The banking industry has always tried to stay ahead of the curve in being adaptive to modernization. It was one of the early adopters in the age of information and understood how much technology would incurve into people’s lives. This has enabled its growth as a pioneer and led it to become one of the largest consumers of Information Technology. Automation and AI are the next logical steps.

Automation in banking is the system of utilizing technology to operate banking processes through highly automatic means rendering human intervention to a minimum.

Gartner reported that the estimated expense on IT applications in the banking sector was $487 Billion in 2018. Lion’s share of this expense was for outsourced external companies which primarily constituted Business Process Outsourcing(BPO) companies. This added up to an approximate of $63 Billion being paid to these BPOs. Such precedent expenses can be avoided by evolving with the technology and the easiest way to minimize it is automation.

How has Automation Evolved Through the Ages?

 

Traditional Automation

Traditional Automation permits and processes machinery to perform tasks. It uses primarily APIs and other methods to integrate systems and developers must be well versed in the functionality of the target system. This may include steps in operational processes and methods.

Traditional automation is limited in some aspects as in application customization due to insufficient software source code. It is also affected by the limitations of APIs. Most of its methods are rather primitive for today’s digital transformation. Nonetheless, it is still prevalent in many places.

RPA

Robotic Process Automation(RPA) focuses on front-end activities and doesn’t need any shifts for backend operations as RPA works across different applications. RPA bots function at UI(User Interface) level and within the system like humans and provide better personalization and easy customization than traditional automation for users.

Some major features of RPA include:

  1. Reliance on easy to program functionality with reduced TAT
  2. Bots execute individual functions- email responses,data extraction,etc.
  3. Works from UI comprehending user actions.

It’s used for data collation, analysis, invoicing, email management, and other customer service functions. Implementing RPA will cut costs for banks on many levels of these spheres as RPA & traditional automation relieve Individuals from tedious tasks.

 

RPA- Market Revenues Worldwide (2016–2022)-Statista– Source

We must understand that RPA doesn’t replace any existing technology but works in tandem with the prevalent framework. In a nutshell, RPA handles repetitive, rule-based, and monotonous tasks and actions.

A common example of an RPA bot is the ubiquitous Chatbot. As RPA doesn’t have any AI involved, its scope to improve is limited. It doesn’t learn but helps the user. Here we discuss primarily RPA applications and Implementations.

Artificial intelligence

Artificial Intelligence is the latest technology for automation and mimics basic human intelligence, further advancing it. Such AI-enabled systems comprehend, evaluate, and respond to complex problems and situations efficiently by using Machine Learning algorithms. Some good examples of AI applications are NLP (Natural language processing) powered voice assistants such as Alexa, Google Assistant, and Siri.

Approximately 32% of service providers in the industry use AI technology to better customer experience and ease processing. They use technology like voice recognition, analytics, etc. This was reported in a joint research by Narrative Science and National Business Research Institute.

AI has expanded to such an extent that all the previous technologies used now fall under its own umbrella. Even then AI is met with some skepticism as it will completely take over the processing procedures and traditionalists may raise questions on dependability.

What Benefits Does Automation Offer that Makes Banking Better?

Automation provides the process of banking with versatile features that makes the entire procedure easier for banks and customers. Not only does it bring the safety and privacy of the customers to a higher standard, but also does it provide them with a fulfilling experience. Some of the features include:

  1. Better Customer Service- Data management becomes easier with RPA implementation. These include Daily inquiries, information transfer, application status, balance information, and others. This will free employee time for more critical decisions and tasks. An example is the functionality of a Chatbot which saves every involved party’s time.
  2. Improved Compliance- Banks are regulated by legislatures and other government bodies that prescribe many strict compliance guidelines. Accenture conducted a survey in 2016 in which 73% of respondents expected RPA to be a key enabler in compliance. This was because it increased productivity by being available 24 hours a day with immense accuracy.
  3. Accounts Payable- It requires vendor information extraction, validation, and payment processing. OCR(Optical Character Recognition) technology is used to obtain data from any physical form and transfers it for RPA where the rest of the processing occurs, thus making the process far more efficient than manual methods.
  4. Faster Credit Card Processing- Banks process credit cards within hours using RPA which used to take days with traditional methods. Proper data of transactions can be maintained and better evaluations of credit scores can also be done.
  5. Faster Mortgage Loan- Even a minor error can impede loan processing. RPA can accelerate the process by avoiding unnecessary errors and implementing proper checks which would reduce the processing time to minutes from days.
  6. Vigilant Fraud Detection- RPA tracks all transactions that may give out a red alert and recognises any fraud transaction pattern in real-time. This brings a considerable reduction in response time and can block and prevent fraud to a great extent.
  7. More Credible KYC Process- Know Your Customer (KYC) is mandatory for banks for each customer. KYC process compliance alone costs banks more than $384 million per year(Thomson Reuters). RPA can reduce this along with the time the customer would have to wait for a response.
  8. Data Report Automation- RPA helps generate reports without any error for stakeholders providing data in many formats. They can create a report by auto-filling the available report format with minimal errors and time.
  9. Easier Account Closure Process- Customers benefit Faster account closing process. This increases their affinity to the bank.

How is Automation Boosting The US Banking Sector?

Valued at USD 167.1 million in 2018 and anticipated to register a CAGR of 31.3% from 2019 to 2025, the global robotic process automation in the BFSI market size was rather unprecedented.

The advent of advanced technologies and a need for increased productivity of operations in the United States of America lead to the entire BFSI sector in the country to significantly boost its demand for RPA. Since the USA has a rich inventory of legacy systems, the incorporation and advancements of RPA were upstanding. This increased the agility and precision of processing.

Even casual users can check their accounts and set up automatic payments of their bills. Even KYC verification and other numerous functions are also possible in a much easier fashion. Numerous other back-end and front-end processes are automated using RPA.

 

Source: www.grandviewresearch.com

In the US a considerable level of RPA has been integrated as alternatives for services such as BPO, robot deployments at the enterprise level, etc which otherwise would have been tremendously expensive. Further, the initiative eliminated repetitive and time expending tasks which have been automated. It reduces the cost of such tasks from 25% to 50% and the TAT to a minimal amount.

Artificial Intelligence and RPA funding spent in the banking and finance industry in the United States increased at 82.9% during 2018 to reach US$ 696.3 million. Over the forecast period (2019–2025), spend on AI is expected to reach a CAGR of 28.4%, increasing from US$ 1,094.9 million in 2019 to reach US$ 6,289.1 million by 2025.

USA and Canada dominated the market for RPA in 2018 in the Banking industry. On average, a U.S. bank with USD 10+ billion assets spends approximately USD 50 million per year on CDD, KYC compliance, and onboarding. The increased expense of KYC and AML compliance coupled with the steep fines over regulatory scrutiny are necessitating financial institutions to adopt new technology and automation. This prevents identity theft, financial fraud, terrorist funding, and money laundering.

The USA and Canada are set to dominate the financial market with RPAs for at least the next half of the decade. Banks are targeting to preserve patrons and reduce customer attrition and RPA helps them as the customer data is strategized and used to contact the customer as required. North America valued at $376.2 billion in 2019 is projected to reach $721.3 billion by 2027. The digital payment segment being the largest service segment in the industry is expected to head the market with the increase in banking products and sales through online portals is also a helpful factor. In 2019 the digital sales sector was valued at $609.4 billion.

Top Banks in US Taking Automation to The Next Level

  1. JPMorgan Chase
    The biggest bank in the US, JPMorgan Chase, always stood in the first place when it came to technology investments. A tremendous investment of $11.4 billion in AI technology by the bank proves its enthusiasm for innovation and far-sighted outlook(Source-JPMorgan Chase Annual report 2019). The Bank uses it for improving their databases, search optimization, and Contract Intelligence (COiN)- a Machine Learning technology that uses chatbot systems to build vast databases of legal documents in a short time.
  2. Bank of America
    They primarily focus on fraud detection, trading functions, and chatbots. The Bank’s AI-enabled chatbot named Erica(Introduced in late 2017) understands texts and speeches. It not only acts as an inquiry bot but advises the user on suitable financial decisions he could take. Erica approximates 6 million users/customers as of March 2019. The $35 billion lender has invested in the past ten years more than $1 billion in mobile banking which is the simplest area of automation for customers. Their own study revealed that mobile customers have increased to 10% annually.
  3. CitiBank
    With an agenda to avoid money laundering and fraud actions, the bank is heavily investing in automation in general and AI technology in particular. They even partnered with Feedzai(2016) for detecting fraudulent transactions. They recognize patterns of multiple transactions from multiple locations where the customer usually doesn’t travel to. The bank has a global network of tech giants that take part in its 6 Citi Global Innovation Labs. With multiple advances in automation and technology, $600 million is expected to be saved per year by the bank.
  4. Wells Fargo
    Their chatbot system primarily focuses on clarifying the queries of customers without consuming too much time or requiring physical presence. They also developed a mobile app through predictive analytics. It alerts the customers on issues like exceeded bill payments, etc. It even guides the user with their travel plan and to buy flight tickets. In the year 2019 alone Wells Fargo had nearly spent $9 billion on technology and automation.

Conclusion

The global adoption of a digital era is inevitable making Banking and Automation essentially complementary to each other. The automation of the banking industry with the use of Traditional Automation, RPA, and AI have led developed nations like the USA to develop a more efficient and sustainable economy.

The reason why banks and financial institutions swiftly adopted IT is that their operations, when executed manually, consume immense time and effort from their employees as well as making them perform routine duties and actions, and in the process, missing the opportunity to move up the value pyramid. Automation produces a standardized audit trail, ensuring the right people have access to the proper systems and making sure that financial institutions stick to industry standards while decreasing expenses involved.

The necessity of Automation in Banking is precedented. Its implementation has been mostly successful, but as all things do, it too requires betterment. At the end of the day, the adoption of Automation for banks and other financial industries is a matter of ‘When’ rather than If’.

About Signzy

Signzy is a market-leading platform redefining the speed, accuracy, and experience of how financial institutions are onboarding customers and businesses – using the digital medium. The company’s award-winning no-code GO platform delivers seamless, end-to-end, and multi-channel onboarding journeys while offering customizable workflows. In addition, it gives these players access to an aggregated marketplace of 240+ bespoke APIs that can be easily added to any workflow with simple widgets.

Signzy is enabling ten million+ end customer and business onboarding every month at a success rate of 99% while reducing the speed to market from 6 months to 3-4 weeks. It works with over 240+ FIs globally, including the 4 largest banks in India, a Top 3 acquiring Bank in the US, and has a robust global partnership with Mastercard and Microsoft. The company’s product team is based out of Bengaluru and has a strong presence in Mumbai, New York, and Dubai.

Visit www.signzy.com for more information about us.

You can reach out to our team at reachout@signzy.com

Written By:

Signzy

Written by an insightful Signzian intent on learning and sharing knowledge.

Explore Signzy's latest insights on advancements in image forgery technology. Learn how innovation is reshaping authentication and security measures

Image Forgery: Innovations In Technology By Signzy

Image forgery has long been a pressing issue in the realms of digital media, cybersecurity, and even legal proceedings. As technology advances, so do the techniques for creating increasingly convincing forgeries. This raises critical concerns for the integrity of digital information and calls for innovative solutions to detect and prevent fraudulent manipulations.

While Facebook, Microsoft, and many others are banding together to help make machine learning capable of detecting deepfakes in videos, we at Signzy are trying to solve a similar problem, detecting fakes in documents. In the journey of building the global digital trust system, we at Signzy had to solve this major challenge of detecting image manipulations in identity documents.

 

Fig 1.0 Example of our forgery detection in action

In this blog, I will try to explain our approach in building an innovative image manipulation detection approach using deep learning.

 

 

The above images are examples of the advancements in image manipulation techniques. It takes a considerable effort for a human to find out that the image is forged. The features which distinguish real and fake are less, which makes it difficult to detect with human eyes.

Our objective was to build a system which could detect image manipulated documents.

Our first step was to create a dataset of forged documents to test the algorithm. With our expertise and domain knowledge in this field we came up with various scenarios on how an intruder would forge a document. The corresponding data for these scenarios was prepared by photoshop experts.

The forged documents were of mostly two categories.

  1. Copy paste : A region of the image copied from a particular document and pasted into a different document.
  2. Copy move : A region of the image copied from a particular document and pasted into the same document.

Copy paste

This is the type of forgery when a fraudster tries to copy a face from one document into another document. Our goal was to detect these forged regions and to classify the document as fake or real.

 

The dataset that we created manually using photoshop experts was not enough to train any deep learning solution around it. So we developed image processing algorithms which could generate synthetic forged data. Now all set for the experimentation.

For forged region detection, our approach was to first start off with the state of the object detection methods. We tried with FRCNN to predict the bounding boxes of the forged region along with the class information. FRCNN uses convolution nets to extract feature maps from the input image. These maps are then passed on to a Region Proposal Network which will give proposals for bounding boxes. These proposals are passed on to the ROI pooling layer which converts all the proposals to the same size. Finally, they are passed on to a fully connected layer to predict bounding boxes and classes. This method did not give us better results because the forged regions were of very small size.

Our second approach was to train a patch-based classifier which could classify between real and forged patches. The idea was on the assumption that if the copied image region has a different compression footprint when compared to the region to which its copied to, there would be a strong shift in the way that the pixels are grouped. This method proved to be very efficient.

 

It almost gave us around 97% accuracy. We did a lot of ablation studies to find the right configurations which I can’t reveal due to IP issues.

Copy Move

This is the type of forgery when a fraudster tries to change any text in an image by copying a similar text from the same image. For example, changing dates. Our goal was to detect these forged regions and to classify the document as fake or real.

 

There is a lot of literature related to detection of this type of forgery. The popular one is DCT based feature matching. In this method, DCT followed by quantization is performed on a 16×16 patch extracted from the image. The similar operation is performed throughout the entire image and all the matrices are sorted. Then for each row in the matrix the corresponding shift vector is calculated. If two regions are copied the shift vector of those regions would match. A very powerful algorithm that works well in most scenarios. But in our use case, since a document has many regions that have the same DCT values this method couldn’t be applied.

Our method involved two parallel networks. First, an encoder-decoder network predicts pixel-wise forged regions. A second network runs in parallel that finds feature maps which are in correlation with forged region predicted by the first network. Both networks are trained together with a cumulative loss function. I regret as I can’t reveal the full solution due to IP issues.

To summarize this blog, I had explained the two major types of forgeries which can be done in documents. Also, I had tried to explain the approaches we took to solve this challenging problem. Hope you had a nice read.

About Signzy

Signzy is a market-leading platform redefining the speed, accuracy, and experience of how financial institutions are onboarding customers and businesses – using the digital medium. The company’s award-winning no-code GO platform delivers seamless, end-to-end, and multi-channel onboarding journeys while offering customizable workflows. In addition, it gives these players access to an aggregated marketplace of 240+ bespoke APIs that can be easily added to any workflow with simple widgets.

Signzy is enabling ten million+ end customer and business onboarding every month at a success rate of 99% while reducing the speed to market from 6 months to 3-4 weeks. It works with over 240+ FIs globally, including the 4 largest banks in India, a Top 3 acquiring Bank in the US, and has a robust global partnership with Mastercard and Microsoft. The company’s product team is based out of Bengaluru and has a strong presence in Mumbai, New York, and Dubai.

Visit www.signzy.com for more information about us.

You can reach out to our team at reachout@signzy.com

Written By:

 A B Sarvanan

Tech Lead — AI Team (Signzy)

 

A more reliable, secure and private video conferencing

Till the COVID-19 pandemic tapers down, work from home and remote functioning have become our current “normal”. We’re in a time where digital transformation has been forced upon companies to remain afloat and surf the wave of changes this situation calls for.

Work processes are adopting new workflows and technology to ensure this period is productive and not stagnant. Staying connected is at the top of the list of work from home priorities. All interaction and meetings have now taken to calls and video conferencing. Third party video conferencing tools were aggressively downloaded by millions in this span. A few weeks in, however, privacy concerns have started circling many video conferencing platforms.

Privacy plague

Video conferencing has surged in popularity recently. Everything is being done online. From taking school lessons, virtually attending weddings, and hosting cabinet meetings. But, it’s privacy shortcomings have now been brought to the fore. In an era of social distancing, as everything takes to the digital, online security cannot be distanced from. It is imperative to protect personal data and organization data shared over the digital space. With most of the tech industry holed up at home, the sheer volume and frequency of shared data has multiplied.

In the past few weeks an online harassment method termed “Zoombombing” emerged [1]. Malefactors disrupted calls on the platform Zoom by flashing inappropriate content such as pornography, hate speech, and shock videos. Privacy advocates also revealed that popular video conferencing tools were caught sending personal data to Facebook. News reports are replete with such privacy concerns exposing these apps’ vulnerabilities.

Whether you’re the type to have tape over your laptop camera or not, it is safer to distance yourself from unsafe platforms. At the same time, privacy does not have to be sacrificed at the feet of convenience.

Digital Trust for Banks and Financial Institutions

For banks and financial institutions, it is imperative to maintain processes that do not jeopardize the privacy of their customers. And at the same time offer protection from fraud. A successful example of a banking workflow that is adapted to be 100% digital is the Know-Your-Customer process for onboarding and customer verification.

Using VideoKYC ensures there are no compromises on safety standards. We have honed the process with numerous layers of checks and balances. These include AI-enabled video forensics and identity document checks. They eliminate security gaps by combining human scrutiny with both software and ML and AI-enabled learning.

While generic video conference tools are not secure enough for financial services, our systems have always been designed for banking grade technology. We’ve developed our tools in a way that banks and financial institutions trust us with their data. This has now been taken a step further with our video-conferencing tool. It is developed keeping the needs of banks and financial institutions in mind.

In some cases the COVID-19 crisis is serving as an impetus to go digital. In other cases digital help is needed to coordinate between offsite and onsite officials. It is a daily need for confidential cross-country interaction. Either way video conferencing is essential to preserve uninterrupted work.

Enumerated below are some uses and features of this technology:

  • Since it is a safe and secure method of communication with no scope of privacy infringement, banks can schedule a call with the customer. This will cut down on the back and forth time that accompanies financial transactions.
  • Instead of the relationship managers from banks having to be physically present, they can now use our tool to communicate with the users. With COVID-19, this can help ensure banks continue their normal functioning, with higher efficiency. Our compliant VideoKYC has now merged with video conferencing, allowing REs to clarify issues in real time.
  • The features are customizable for the bank. The organizer (bank) can restrict the functionalities available to the user. For example, a bank can decide they do not want to let the user switch off video during the interaction.
  • The technology is good for auditing the call. Any breach in protocol can be caught through this auditing. Since this has been developed keeping banks in mind, no other third party software enables this.

Certainty of security in a time of uncertainty

We can’t say till when you’ll have to work from home. But, we can ensure that our tools are tested to be secure, simple, and even compliant.

  • No leakage of data
    The platform prevents the leakage of personal data such as email IDs and photos.
  • End-to-end encryption
    We ensure end-to-end encryption of all data shared over our platform. A third party cannot decrypt the calls.
  • Seamless communication
    While the technology ensures full protection of the interaction, the UI ensures it is also easy to use and seamless.
  • Only a person with an invitation can join the call. This prevents any hackers or miscreants from disrupting the call. Our video conferencing tool ensures there is no scope for malicious activity such as “Zoombombing” to occur.
  • Signzy has control over the data flow. There have been recent concerns where data is being routed through China by video conferencing platforms [2].

Companies that adopt Signzy’s secure video conferencing have one less thing to worry about in these strange times.

About Signzy

Signzy is a market-leading platform redefining the speed, accuracy, and experience of how financial institutions are onboarding customers and businesses – using the digital medium. The company’s award-winning no-code GO platform delivers seamless, end-to-end, and multi-channel onboarding journeys while offering customizable workflows. In addition, it gives these players access to an aggregated marketplace of 240+ bespoke APIs that can be easily added to any workflow with simple widgets.

Signzy is enabling ten million+ end customer and business onboarding every month at a success rate of 99% while reducing the speed to market from 6 months to 3-4 weeks. It works with over 240+ FIs globally, including the 4 largest banks in India, a Top 3 acquiring Bank in the US, and has a robust global partnership with Mastercard and Microsoft. The company’s product team is based out of Bengaluru and has a strong presence in Mumbai, New York, and Dubai.

Visit www.signzy.com for more information about us.

You can reach out to our team at reachout@signzy.com

Written By:

Signzy

Written by an insightful Signzian intent on learning and sharing knowledge.

 

Removing blur from images

Everyone misses perfect shots once in a while. Yeah, that’s a pretty shame (We all do that all the time!!!).

There are special moments which we want to capture to make them memorable for lifetime, but just because your camera shook or amount of noise in your camera can really hamper those special moment resulting in blurred images (Maybe your subject is on the move, the reason is not always bad cameras but bad timing as well!!!).

So, if you are also one of us who misses out their special moment, this post is just for you. In this post, you will get to know how you can restore blurred images. All the thanks and applause goes to Neural networks.

What are you going to learn?

From this blog post, you will learn how to make use of the neural network by image deblurring technique with the help of Scale-recurrent Networks. For more info on the technique, you can access this link. The network takes sequence of blurry images as input at different scales and produces a finite set of sharp images. The final output image is at the full resolution.

Figure 1: SRN architecture from the original paper

The method above uses end-to-end trainable networks for the images. Then it used multi-scale convolutional neural network with the approach of state-of-art.

These methods embark from an abrasive measure of the blurry images, and gradually try to recover the suppressed image at higher resolutions.

This Simple Recurrent Network aka SRN makes use of scale recurrent network for multi-scale deblurring. The solver and the corresponding parameter at each scale in a well-established multi-scale method are always same. This is a natural choice as it simply aims to solve the very same problem. If we vary scales in different parameters, then it may cause instability and the extra issue of unrestrictive solution space. Another concern to address here is the input images may have different motion scales and resolutions.

If you allow too much parameter tweaking in each scale, then this might result in creating a solution that is overfitted to a specific motion scale. There are people who believe that this method is also applied to CNN-based methods. Still, there are some recent cascaded networks that still prefer to use independent parameters for every single scale. They justify this method with a pointer which seems quite plausible. They proposed that sharing networks weights across different scales can significantly deteriorate training difficulty and also introduce stability benefit.

Their experiment shows that how with the help of recurrent structure and the combination of the above advantages, the end-to-end deep image deblurring framework can greatly mend training efficiency. They only use less than 1/3rd of the trainable parameters with faster testing time. Apart from this, their method is proven to produce high-quality results both qualitatively and quantitatively. Let’s not dive deep in the research paper for now. Allow me to present you our use-case of this deblurring technology.

We are well-established Global Digital Trust Company, which functions primarily in the domain of verification processes. For this verification process, our customers have to click photos of their documents and submit it for verification. There are probable chances where these photographs may be blurred either due to camera shake or any motion which causes difficulty in reading the document text.

To solve the blurred image problem, we fed these images in the aforementioned Deblurring Model. The results were exhilarating. Below are some of the samples,

Concluding Remarks

What do you learn from this blog? You learn the use of scale recurrent network for deblurring images. With this technology, you can easily extract data from blurred identity card images. You don’t have to poke your customers again and again for the re-submission of the documents due to bad-quality or blurred images. Thanks for the read and do leave a comment to let me know what you feel about this technology. Adios for now fellas!!!

About Signzy

Signzy is a market-leading platform redefining the speed, accuracy, and experience of how financial institutions are onboarding customers and businesses – using the digital medium. The company’s award-winning no-code GO platform delivers seamless, end-to-end, and multi-channel onboarding journeys while offering customizable workflows. In addition, it gives these players access to an aggregated marketplace of 240+ bespoke APIs that can be easily added to any workflow with simple widgets.

Signzy is enabling ten million+ end customer and business onboarding every month at a success rate of 99% while reducing the speed to market from 6 months to 3-4 weeks. It works with over 240+ FIs globally, including the 4 largest banks in India, a Top 3 acquiring Bank in the US, and has a robust global partnership with Mastercard and Microsoft. The company’s product team is based out of Bengaluru and has a strong presence in Mumbai, New York, and Dubai.

Visit www.signzy.com for more information about us.

You can reach out to our team at reachout@signzy.com

Written By:

Signzy

Written by an insightful Signzian intent on learning and sharing knowledge.

 

How we built a modern, state of the art OCR pipeline — PreciousDory

Finally I am very happy writing this blog after a long wait. As the title suggests PreciousDory is a modern optical character recognition (OCR) engine which performs better than the engines from tech giants like Google, Microsoft, Abby in KYC use cases. We feel now it is time to tell the world how we built this strong OCR pipeline over the last couple of years.

We at Signzy are trying to build a global digital trust system. We solve various fascinating problems related to AI and computer vision. Of them, text extraction from document images was one of the critical problem we had to solve. In the initial phase of our journey we were using traditional rule based OCR pipeline to extract text data from document images. Those OCR engines were not that efficient to compete with global competitors. So In an urge to stay competitive with the global market we took an ambitious decision to build an inhouse modern OCR pipeline. We wanted to build an OCR engine which will surpass the global leaders in that segment.

 

The herculean challenge was out and our AI team accepted it with a bliss. We know building a production ready OCR engine and achieving best in class results is not an easy task. But we are a bunch of gallant people in our AI team. When we started researching about the problem we found very few resources to help us out. And we also stumbled upon the below meme ?

 

If You Can’t Measure It, You Can’t Improve It

The first task our team did was to create a test dataset that would represent all the real world scenarios we could encounter. The scenarios includes varying viewpoints, illumination, deformation, occlusion, background clutter, etc. Below are some samples of our test dataset.

Sample test data

When you have a big problem to solve, break it down into smaller ones

We spent a quite a lot of time in literature study trying to break the problem into sub-problem so that our individual team members could start working on it. We ended with the below macro level architecture.

Macro level architecture

After coming up with the basic architectures our team started exploring the individual entities. Our core OCR engine comprises of 4 key components.

  1. CropNET
  2. RotationNET
  3. Text localizer
  4. Word classifier

CropNET

This is the first step in the OCR pipeline. The input documents for our engine will have a lot of background noise. We needed an algorithm to exactly crop out the region of interest so that the job gets easier in the subsequent steps. In the initial phase we tried out lot of traditional image processing techniques like edge detection, color matching, Hough lines etc. None of them could withstand our test data. Then we took the deep learning approach. The idea was to build a regression model to predict the four edges of the document to be processed. The train data for this model was the ground truth containing the four coordinates of the document. We implemented a custom shallow architecture for predicting the outputs. We achieved good performance from the model.

RotationNET

This is the second stage in the pipeline. After cropping, the next problem to solve is rotation. It was estimated that 5% of the production documents would be rotated in arbitrary angles. But for the OCR pipeline to work properly the document should be at zero degree. To tackle the problem we built a classification model which predicts the angle of document. There are 360 classes corresponding to each degree of rotation. The challenge was in creating the training data. As we had only few real world samples for training each class, we had to build a custom exhaustive pipeline for preparing synthetic training data which closely matches with real world data. Upon training , we achieved impressive results from the model.

Text localizer

The third stage is localizing the text areas. This is the most challenging problem to solve. Given a document the algorithm must be able to localize the text regions for further processing. We knew building this algorithm from scratch is a mammoth task. We benchmarked various open source text detection models on our test datasets.

Text localization — Benchmark

After rigorous testing we decided to go with CTPN. Connectionist Text Proposal Network (CTPN) accurately localizes text lines in natural image. It detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. It was developed with a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy. The sequential proposals are naturally connected by a recurrent neural network, which is seamlessly incorporated into the convolutional network, resulting in an end-to-end trainable model. This allows the CTPN to explore rich context information of image, making it powerful to detect extremely ambiguous text.

 

Word classifier

This is the final stage and the most critical step in the OCR engine. This is the step where most of our efforts and time went into. After localizing the text regions in the document, the region of interest was cropped out of the document. Now the final challenge is predict the text from this. Upon rigorous literature study we arrived with two approaches for solving this problem.

  1. Character level classification
  2. Word level classification

Character level

This is one of the traditional approach. In this method the bounding box of individual characters are estimated and from them the characters are cropped out and presented for classification. Now what we have in hand is a MNIST kind of dataset. Building a classifier for this type of task is tried and tested method. But the real challenge in this approach was in building the character level bounding box predictor. Normal segmentation methods failed to perform on our test dataset. We thought of developing a FRCNN like object detection pipeline for localizing the individual characters. But creating the training data for this method was a tedious task and involves a lot of manual work. So we ended up dropping this method.

Word level classifier

This method is based on deep learning. Here we pass the full text localized region into a end to end pipeline and directly get the predicted text. The cropped text region is passed into a CNN for spatial feature extraction and then passed on to RNN for extracting temporal features. We are using CTC loss to train the architecture. CTC loss solves two problems: 1. You can train the network from pairs (Image, Text) without having to specify at which position a character occurs using the CTC loss. 2. You don’t have to postprocess the output, as a CTC decoder transforms the NN output into the final text.

The training data for this pipeline is cropped word image regions and their corresponding ground truth text. Since a large amount of training data was required to make the model converge, we made a separate data creation pipeline. In this we first get the cropped word regions from the document, secondly we feed it into third party OCR engine to get the corresponding text. We used this data to benchmark it against manually created human data. The manual data was again verified by a 2 stage human process to make sure the labels are right.

We achieved impressive results with the model. A sample output from the model.

 

Time for results

At Last we combined all the four key components into a single end to end pipeline. The algorithm now takes an input image of a document and gives the corresponding OCR text as output. Below is a sample input and output of a document.

 

Now the engine was ready to face our quality analysis team for validation. They benchmarked the pipeline against popular global third party OCR engines on our custom validation set. Below are the test results for certain important documents we were handling.

 

We tested our OCR engine against other top engines on different scenarios. It includes cases with no background, different background, high brightness and low brightness. The results shows that we are able to perform better than the popular known OCR engines in most scenarios.

Productionzation

The pipeline was built now and tested. But still it was not ready to face the real world. Some of the challenges in productionsing the system are listed below.

  1. Our OCR engine was using GPU for inference. But since we wanted the solution to be used by our clients without any change in their infrastructure, we removed all the GPU dependencies and rewrote the code to run in CPU.
  2. To serve large number of requests more efficiently we builded a queueing mechanism.
  3. For easier integration with existing client infrastructures, we provided the solution as a REST API
  4. Finally the whole pipeline was containerized to ease the deployment at enterprises.

Summary

Thus a mammoth of task building a modern OCR pipeline was accomplished. A special thanks to my team members Nishant and Harshit for making this project successful. One of the key take away from the project was that if you have an exciting problem and a passionate team in hand, you could make the impossible possible. And I could not explain a lot of steps in details since I had to keep the blog short. Do write to me if you have any queries.

About Signzy

Signzy is a market-leading platform redefining the speed, accuracy, and experience of how financial institutions are onboarding customers and businesses – using the digital medium. The company’s award-winning no-code GO platform delivers seamless, end-to-end, and multi-channel onboarding journeys while offering customizable workflows. In addition, it gives these players access to an aggregated marketplace of 240+ bespoke APIs that can be easily added to any workflow with simple widgets.

Signzy is enabling ten million+ end customer and business onboarding every month at a success rate of 99% while reducing the speed to market from 6 months to 3-4 weeks. It works with over 240+ FIs globally, including the 4 largest banks in India, a Top 3 acquiring Bank in the US, and has a robust global partnership with Mastercard and Microsoft. The company’s product team is based out of Bengaluru and has a strong presence in Mumbai, New York, and Dubai.

Visit www.signzy.com for more information about us.

You can reach out to our team at reachout@signzy.com

Written By:

Signzy

Written by an insightful Signzian intent on learning and sharing knowledge.

 

1 4 5 6 7