{"id":2871,"date":"2018-11-07T17:08:08","date_gmt":"2018-11-07T17:08:08","guid":{"rendered":"https:\/\/www.appservgrid.com\/paw92\/?p=2871"},"modified":"2018-11-11T23:55:00","modified_gmt":"2018-11-11T23:55:00","slug":"introducing-pydbgen-a-random-dataframe-database-table-generator","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw92\/index.php\/2018\/11\/07\/introducing-pydbgen-a-random-dataframe-database-table-generator\/","title":{"rendered":"Introducing pydbgen: A random dataframe\/database table generator"},"content":{"rendered":"<p>When you start learning data science, often your biggest worry is not the algorithms or techniques but getting access to raw data. While there are many high-quality, real-life datasets available on the web for trying out cool machine learning techniques, I&#8217;ve found that the same is not true when it comes to learning SQL.<\/p>\n<p>For data science, having a basic familiarity with SQL is almost as important as knowing how to write code in Python or R. But it&#8217;s far easier to find toy datasets on Kaggle than it is to access a large enough database with real data (such as name, age, credit card, social security number, address, birthday, etc.) specifically designed or curated for machine learning tasks.<\/p>\n<p>Wouldn&#8217;t it be great to have a simple tool or library to generate a large database with multiple tables filled with data of your own choice?<\/p>\n<p>Aside from beginners in data science, even seasoned software testers may find it useful to have a simple tool where, with a few lines of code, they can generate arbitrarily large data sets with random (fake), yet meaningful entries.<\/p>\n<p>For this reason, I am glad to introduce a lightweight Python library called <a href=\"https:\/\/github.com\/tirthajyoti\/pydbgen\" target=\"_blank\" rel=\"noopener\">pydbgen<\/a>. In this article, I&#8217;ll briefly share some information about the package, and you can learn much more <a href=\"http:\/\/pydbgen.readthedocs.io\/en\/latest\/\" target=\"_blank\" rel=\"noopener\">by reading the docs<\/a>.<\/p>\n<h2>What is pydbgen?<\/h2>\n<p>Pydbgen is a lightweight, pure-Python library to generate random useful entries (e.g., name, address, credit card number, date, time, company name, job title, license plate number, etc.) and save them in a Pandas dataframe object, as an SQLite table in a database file, or in a Microsoft Excel file.<\/p>\n<p><a href=\"http:\/\/lxer.com\/module\/newswire\/ext_link.php?rid=262531\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>When you start learning data science, often your biggest worry is not the algorithms or techniques but getting access to raw data. While there are many high-quality, real-life datasets available on the web for trying out cool machine learning techniques, I&#8217;ve found that the same is not true when it comes to learning SQL. For &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw92\/index.php\/2018\/11\/07\/introducing-pydbgen-a-random-dataframe-database-table-generator\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Introducing pydbgen: A random dataframe\/database table generator&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2871","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/2871","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/comments?post=2871"}],"version-history":[{"count":2,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/2871\/revisions"}],"predecessor-version":[{"id":3081,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/2871\/revisions\/3081"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/media?parent=2871"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/categories?post=2871"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/tags?post=2871"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}