Loan data from Lending ClubSource:
This data set represents thousands of loans made through the Lending Club platform, which is a platform that allows individuals to lend to other individuals. Of course, not all loans are created equal. Someone who is a essentially a sure bet to pay back a loan will have an easier time getting a loan with a low interest rate than someone who appears to be riskier. And for people who are very risky? They may not even get a loan offer, or they may not have accepted the loan offer due to a high interest rate. It is important to keep that last part in mind, since this data set only represents loans actually made, i.e. do not mistake this data for loan applications!
A data frame with 10,000 observations on the following 55 variables.
Number of years in the job, rounded down. If longer than 10 years, then this is represented by the value
Two-letter state code.
The ownership status of the applicant's residence.
Type of verification of the applicant's income.
If this is a joint application, then the annual income of the two parties applying.
Type of verification of the joint income.
Debt-to-income ratio for the two parties.
Delinquencies on lines of credit in the last 2 years.
Months since the last delinquency.
Year of the applicant's earliest line of credit
Inquiries into the applicant's credit during the last 12 months.
Total number of credit lines in this applicant's credit history.
Number of currently open lines of credit.
Total available credit, e.g. if only credit cards, then the total of all the credit limits. This excludes a mortgage.
Total credit balance, excluding a mortgage.
Number of collections in the last 12 months. This excludes medical collections.
The number of derogatory public records, which roughly means the number of times the applicant failed to pay.
Months since the last time the applicant was 90 days late on a payment.
Number of accounts where the applicant is currently delinquent.
The total amount that the applicant has had against them in collections.
Number of installment accounts, which are (roughly) accounts with a fixed payment amount and period. A typical example might be a 36-month car loan.
Number of new lines of credit opened in the last 24 months.
Number of months since the last credit inquiry on this applicant.
Number of satisfactory accounts.
Number of current accounts that are 120 days past due.
Number of current accounts that are 30 days past due.
Number of currently active bank cards.
Total of all bank card limits.
Total number of credit card accounts in the applicant's history.
Total number of currently open credit card accounts.
Number of credit cards that are carrying a balance.
Number of mortgage accounts.
Percent of all lines of credit where the applicant was never delinquent.
a numeric vector
Number of bankruptcies listed in the public record for this applicant.
The category for the purpose of the loan.
The type of application: either
The amount of the loan the applicant received.
The number of months of the loan the applicant received.
Interest rate of the loan the applicant received.
Monthly payment for the loan the applicant received.
Grade associated with the loan.
Detailed grade associated with the loan.
Month the loan was issued.
Status of the loan.
Initial listing status of the loan. (I think this has to do with whether the lender provided the entire loan or if the loan is across multiple lenders.)
Dispersement method of the loan.
Current balance on the loan.
Total that has been paid on the loan by the applicant.
The difference between the original loan amount and the current balance on the loan.
The amount of interest paid so far by the applicant.
Late fees paid by the applicant.
This data comes from Lending Club (https://www.lendingclub.com/info/statistics.action), which provides a very large, open set of data on the people who received loans through their platform.
loans_full_schema #> # A tibble: 10,000 × 55 #> emp_title emp_length state homeownership annual_income verified_income #> <chr> <dbl> <fct> <fct> <dbl> <fct> #> 1 "global config … 3 NJ MORTGAGE 90000 Verified #> 2 "warehouse offi… 10 HI RENT 40000 Not Verified #> 3 "assembly" 3 WI RENT 40000 Source Verified #> 4 "customer servi… 1 PA RENT 30000 Not Verified #> 5 "security super… 10 CA RENT 35000 Verified #> 6 "" NA KY OWN 34000 Not Verified #> 7 "hr " 10 MI MORTGAGE 35000 Source Verified #> 8 "police" 10 AZ MORTGAGE 110000 Source Verified #> 9 "parts" 10 NV MORTGAGE 65000 Source Verified #> 10 "4th person" 3 IL RENT 30000 Not Verified #> # ℹ 9,990 more rows #> # ℹ 49 more variables: debt_to_income <dbl>, annual_income_joint <dbl>, #> # verification_income_joint <fct>, debt_to_income_joint <dbl>, #> # delinq_2y <int>, months_since_last_delinq <int>, #> # earliest_credit_line <dbl>, inquiries_last_12m <int>, #> # total_credit_lines <int>, open_credit_lines <int>, #> # total_credit_limit <int>, total_credit_utilized <int>, …